Search Results

Search found 5377 results on 216 pages for 'robert low'.

Page 213/216 | < Previous Page | 209 210 211 212 213 214 215 216 | Next Page >

How do I make my applet turn the user's input into an integer and compare it to the computer's random number?

- by Kitteran

I'm in beginning programming and I don't fully understand applets yet. However, (with some help from internet tutorials) I was able to create an applet that plays a game of guess with the user. The applet compiles fine, but when it runs, this error message appears: "Exception in thread "main" java.lang.NumberFormatException: For input string: "" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Integer.parseInt(Integer.java:470) at java.lang.Integer.parseInt(Integer.java:499) at Guess.createUserInterface(Guess.java:101) at Guess.<init>(Guess.java:31) at Guess.main(Guess.java:129)" I've tried moving the "userguess = Integer.parseInt( t1.getText() );" on line 101 to multiple places, but I still get the same error. Can anyone tell me what I'm doing wrong? The Code: // Creates the game GUI. import javax.swing.*; import java.awt.*; import java.awt.event.*; public class Guess extends JFrame{ private JLabel userinputJLabel; private JLabel lowerboundsJLabel; private JLabel upperboundsJLabel; private JLabel computertalkJLabel; private JButton guessJButton; private JPanel guessJPanel; static int computernum; int userguess; static void declare() { computernum = (int) (100 * Math.random()) + 1; //random number picked (1-100) } // no-argument constructor public Guess() { createUserInterface(); } // create and position GUI components private void createUserInterface() { // get content pane and set its layout Container contentPane = getContentPane(); contentPane.setLayout( null ); contentPane.setBackground( Color.white ); // set up userinputJLabel userinputJLabel = new JLabel(); userinputJLabel.setText( "Enter Guess Here -->" ); userinputJLabel.setBounds( 0, 65, 120, 50 ); userinputJLabel.setHorizontalAlignment( JLabel.CENTER ); userinputJLabel.setBackground( Color.white ); userinputJLabel.setOpaque( true ); contentPane.add( userinputJLabel ); // set up lowerboundsJLabel lowerboundsJLabel = new JLabel(); lowerboundsJLabel.setText( "Lower Bounds Of Guess = 1" ); lowerboundsJLabel.setBounds( 0, 0, 170, 50 ); lowerboundsJLabel.setHorizontalAlignment( JLabel.CENTER ); lowerboundsJLabel.setBackground( Color.white ); lowerboundsJLabel.setOpaque( true ); contentPane.add( lowerboundsJLabel ); // set up upperboundsJLabel upperboundsJLabel = new JLabel(); upperboundsJLabel.setText( "Upper Bounds Of Guess = 100" ); upperboundsJLabel.setBounds( 250, 0, 170, 50 ); upperboundsJLabel.setHorizontalAlignment( JLabel.CENTER ); upperboundsJLabel.setBackground( Color.white ); upperboundsJLabel.setOpaque( true ); contentPane.add( upperboundsJLabel ); // set up computertalkJLabel computertalkJLabel = new JLabel(); computertalkJLabel.setText( "Computer Says:" ); computertalkJLabel.setBounds( 0, 130, 100, 50 ); //format (x, y, width, height) computertalkJLabel.setHorizontalAlignment( JLabel.CENTER ); computertalkJLabel.setBackground( Color.white ); computertalkJLabel.setOpaque( true ); contentPane.add( computertalkJLabel ); //Set up guess jbutton guessJButton = new JButton(); guessJButton.setText( "Enter" ); guessJButton.setBounds( 250, 78, 100, 30 ); contentPane.add( guessJButton ); guessJButton.addActionListener( new ActionListener() // anonymous inner class { // event handler called when Guess button is pressed public void actionPerformed( ActionEvent event ) { guessActionPerformed( event ); } } // end anonymous inner class ); // end call to addActionListener // set properties of application's window setTitle( "Guess Game" ); // set title bar text setSize( 500, 500 ); // set window size setVisible( true ); // display window //create text field TextField t1 = new TextField(); // Blank text field for user input t1.setBounds( 135, 78, 100, 30 ); contentPane.add( t1 ); userguess = Integer.parseInt( t1.getText() ); //create section for computertalk Label computertalkLabel = new Label(""); computertalkLabel.setBounds( 115, 130, 300, 50); contentPane.add( computertalkLabel ); } // Display computer reactions to user guess private void guessActionPerformed( ActionEvent event ) { if (userguess > computernum) //if statements (computer's reactions to user guess) computertalkJLabel.setText( "Computer Says: Too High" ); else if (userguess < computernum) computertalkJLabel.setText( "Computer Says: Too Low" ); else if (userguess == computernum) computertalkJLabel.setText( "Computer Says:You Win!" ); else computertalkJLabel.setText( "Computer Says: Error" ); } // end method oneJButtonActionPerformed // end method createUserInterface // main method public static void main( String args[] ) { Guess application = new Guess(); application.setDefaultCloseOperation (JFrame.EXIT_ON_CLOSE); } // end method main } // end class Phone

Read the article
bandwidth throttling C linux

- by bob moch

hi im currently creating a function to create a sleep time i can pause between packets for my port scanner im creating for personal/educational use for my home network. what im currently doing is opening /proc/net/dev and reading the 9th set of digits for the eth0 interface to find out the current packets being set and then reading it again and doing some math to figure out a delay to sleep between sending a packet to a port to identify it and fingerprint it. my problem is that no matter what throttle % i use it always seems to send the same rate of packets. i think its mainly my way of mathematically creating my sleep delay. edit:: dont mind the function declaration and the struct stuff all im doing is spawning this function in a thread and passing a pointer to a struct to the function, recreating the struct locally and then freeing the passed structs memory. void *bandwidthmonitor_cmd(void *param) { char cmdline[1024], *bytedata[19]; int i = 0, ii = 0; long long prevbytes = 0, currentbytes = 0, elapsedbytes = 0, byteusage = 0, maxthrottle = 0; command_struct bandwidth = *((command_struct *)param); free(param); //printf("speed: %d\n throttle: %d\n\n", UPLOAD_SPEED, bandwidth.throttle); maxthrottle = UPLOAD_SPEED * bandwidth.throttle / 100; //printf("max throttle:%lld\n", maxthrottle); FILE *f = fopen("/proc/net/dev", "r"); if(f != NULL) { while(1) { while(fgets(cmdline, sizeof(cmdline), f) != NULL) { cmdline[strlen(cmdline)] = '\0'; if(strncmp(cmdline, " eth0", 6) == 0) { bytedata[0] = strtok(cmdline, " "); while(bytedata[i] != NULL) { i++; bytedata[i] = strtok(NULL, " "); } bytedata[i + 1] = '\0'; currentbytes = atoi(bytedata[9]); } } i = 0; rewind(f); elapsedbytes = currentbytes - prevbytes; prevbytes = currentbytes; byteusage = 8 * (elapsedbytes / 1024); //printf("usage:%lld\n",byteusage); if(ii & 0x40) { SLEEP += (maxthrottle - byteusage) * -1.1;//-2.5; if(SLEEP < 0){ SLEEP = 0; } //printf("sleep:%d\n", SLEEP); } usleep(25000); ii++; } } return NULL; } SLEEP and UPLOAD_SPEED are global variables and UPLOAD_SPEED is in kb/s and generated via a speedtest function that gets the upload speed of my computer. this function is running inside a POSIX thread updating SLEEP which my threads doing the socket work grab to sleep by after every packet. as testing instead of only doing the ports i want to check i make it do all the ports over and over again so i can run dstat on a machine to check bandwidth and no matter what bandwidth.throttle is set to it always seems to generate the same amount of bandwidth to the dstat machine. the way i calculate how much i "should" throttle by is by finding the maximum throttle speed which is defined as maxthrottle = upload_speed * throttle / 100; for example if my upload speed was 1000kb/s and my throttle was 90 (90%) my max throttle would be 900kb/s from there it would find the current bytes sent from /proc/net/dev and then find my sleep time via incrementing or decrementing it via sleep += (maxthrottle - bytesysed) * -1.1; this should in theory increase or decrease the sleep time based on how many bytes used there are. the if(ii & 0x40) statement is just for some moderation control. it makes it so it only sets sleep to a new time every 30-40 iterations. final notes: the main problem is that the sleep timer does not seem to modify the speed of packets being set. or maybe its just my implementation because on a freshly restarted machine where /proc/net/dev has low numbers of bytes sent it seems to raise the sleep timer accordingly on my 60kb/s upload machine (ex if i set the throttle to 2 it will incline the sleep timer until network bandwidth out reaches the max bandwidth threshold, but when i try running it on a server which as been online forever it doesnt seem to work as nicely if at all. if anyone can suggest a new method of monitoring the network to adjust a sleep delay then let me know or if anyone sees a flaw in my code. thank you.

Read the article
nagios NRPE: Unable to read output

- by user555854

I currently set up a script to restart my http servers + php5 fpm but can't get it to work. I have googled and have found that mostly permissions are the problems of my error but can't figure it out. I start my script using /usr/lib/nagios/plugins/check_nrpe -H bart -c restart_http This is the output in my syslog on the node I want to restart Jun 27 06:29:35 bart nrpe[8926]: Connection from 192.168.133.17 port 25028 Jun 27 06:29:35 bart nrpe[8926]: Host address is in allowed_hosts Jun 27 06:29:35 bart nrpe[8926]: Handling the connection... Jun 27 06:29:35 bart nrpe[8926]: Host is asking for command 'restart_http' to be run... Jun 27 06:29:35 bart nrpe[8926]: Running command: /usr/bin/sudo /usr/lib/nagios/plugins/http-restart Jun 27 06:29:35 bart nrpe[8926]: Command completed with return code 1 and output: Jun 27 06:29:35 bart nrpe[8926]: Return Code: 1, Output: NRPE: Unable to read output Jun 27 06:29:35 bart nrpe[8926]: Connection from 192.168.133.17 closed. If I run the command myself it runs fine (but asks for a password) (nagios user) This are the script permission and the script contents. -rwxrwxrwx 1 nagios nagios 142 Jun 26 21:41 /usr/lib/nagios/plugins/http-restart #!/bin/bash echo "ok" /etc/init.d/nginx stop /etc/init.d/nginx start /etc/init.d/php5-fpm stop /etc/init.d/php5-fpm start echo "done" I also added this line to visudo nagios ALL=(ALL) NOPASSWD: /usr/lib/nagios/plugins/ My local nagios nrpe.cfg ############################################################################# # Sample NRPE Config File # Written by: Ethan Galstad ([email protected]) # # # NOTES: # This is a sample configuration file for the NRPE daemon. It needs to be # located on the remote host that is running the NRPE daemon, not the host # from which the check_nrpe client is being executed. ############################################################################# # LOG FACILITY # The syslog facility that should be used for logging purposes. log_facility=daemon # PID FILE # The name of the file in which the NRPE daemon should write it's process ID # number. The file is only written if the NRPE daemon is started by the root # user and is running in standalone mode. pid_file=/var/run/nagios/nrpe.pid # PORT NUMBER # Port number we should wait for connections on. # NOTE: This must be a non-priviledged port (i.e. > 1024). # NOTE: This option is ignored if NRPE is running under either inetd or xinetd server_port=5666 # SERVER ADDRESS # Address that nrpe should bind to in case there are more than one interface # and you do not want nrpe to bind on all interfaces. # NOTE: This option is ignored if NRPE is running under either inetd or xinetd #server_address=127.0.0.1 # NRPE USER # This determines the effective user that the NRPE daemon should run as. # You can either supply a username or a UID. # # NOTE: This option is ignored if NRPE is running under either inetd or xinetd nrpe_user=nagios # NRPE GROUP # This determines the effective group that the NRPE daemon should run as. # You can either supply a group name or a GID. # # NOTE: This option is ignored if NRPE is running under either inetd or xinetd nrpe_group=nagios # ALLOWED HOST ADDRESSES # This is an optional comma-delimited list of IP address or hostnames # that are allowed to talk to the NRPE daemon. # # Note: The daemon only does rudimentary checking of the client's IP # address. I would highly recommend adding entries in your /etc/hosts.allow # file to allow only the specified host to connect to the port # you are running this daemon on. # # NOTE: This option is ignored if NRPE is running under either inetd or xinetd allowed_hosts=127.0.0.1,192.168.133.17 # COMMAND ARGUMENT PROCESSING # This option determines whether or not the NRPE daemon will allow clients # to specify arguments to commands that are executed. This option only works # if the daemon was configured with the --enable-command-args configure script # option. # # *** ENABLING THIS OPTION IS A SECURITY RISK! *** # Read the SECURITY file for information on some of the security implications # of enabling this variable. # # Values: 0=do not allow arguments, 1=allow command arguments dont_blame_nrpe=0 # COMMAND PREFIX # This option allows you to prefix all commands with a user-defined string. # A space is automatically added between the specified prefix string and the # command line from the command definition. # # *** THIS EXAMPLE MAY POSE A POTENTIAL SECURITY RISK, SO USE WITH CAUTION! *** # Usage scenario: # Execute restricted commmands using sudo. For this to work, you need to add # the nagios user to your /etc/sudoers. An example entry for alllowing # execution of the plugins from might be: # # nagios ALL=(ALL) NOPASSWD: /usr/lib/nagios/plugins/ # # This lets the nagios user run all commands in that directory (and only them) # without asking for a password. If you do this, make sure you don't give # random users write access to that directory or its contents! command_prefix=/usr/bin/sudo # DEBUGGING OPTION # This option determines whether or not debugging messages are logged to the # syslog facility. # Values: 0=debugging off, 1=debugging on debug=1 # COMMAND TIMEOUT # This specifies the maximum number of seconds that the NRPE daemon will # allow plugins to finish executing before killing them off. command_timeout=60 # CONNECTION TIMEOUT # This specifies the maximum number of seconds that the NRPE daemon will # wait for a connection to be established before exiting. This is sometimes # seen where a network problem stops the SSL being established even though # all network sessions are connected. This causes the nrpe daemons to # accumulate, eating system resources. Do not set this too low. connection_timeout=300 # WEEK RANDOM SEED OPTION # This directive allows you to use SSL even if your system does not have # a /dev/random or /dev/urandom (on purpose or because the necessary patches # were not applied). The random number generator will be seeded from a file # which is either a file pointed to by the environment valiable $RANDFILE # or $HOME/.rnd. If neither exists, the pseudo random number generator will # be initialized and a warning will be issued. # Values: 0=only seed from /dev/[u]random, 1=also seed from weak randomness #allow_weak_random_seed=1 # INCLUDE CONFIG FILE # This directive allows you to include definitions from an external config file. #include=<somefile.cfg> # INCLUDE CONFIG DIRECTORY # This directive allows you to include definitions from config files (with a # .cfg extension) in one or more directories (with recursion). #include_dir=<somedirectory> #include_dir=<someotherdirectory> # COMMAND DEFINITIONS # Command definitions that this daemon will run. Definitions # are in the following format: # # command[<command_name>]=<command_line> # # When the daemon receives a request to return the results of <command_name> # it will execute the command specified by the <command_line> argument. # # Unlike Nagios, the command line cannot contain macros - it must be # typed exactly as it should be executed. # # Note: Any plugins that are used in the command lines must reside # on the machine that this daemon is running on! The examples below # assume that you have plugins installed in a /usr/local/nagios/libexec # directory. Also note that you will have to modify the definitions below # to match the argument format the plugins expect. Remember, these are # examples only! # The following examples use hardcoded command arguments... command[check_users]=/usr/lib/nagios/plugins/check_users -w 5 -c 10 command[check_load]=/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20 command[check_hda1]=/usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /dev/hda1 command[check_zombie_procs]=/usr/lib/nagios/plugins/check_procs -w 5 -c 10 -s Z command[check_total_procs]=/usr/lib/nagios/plugins/check_procs -w 150 -c 200 # The following examples allow user-supplied arguments and can # only be used if the NRPE daemon was compiled with support for # command arguments *AND* the dont_blame_nrpe directive in this # config file is set to '1'. This poses a potential security risk, so # make sure you read the SECURITY file before doing this. #command[check_users]=/usr/lib/nagios/plugins/check_users -w $ARG1$ -c $ARG2$ #command[check_load]=/usr/lib/nagios/plugins/check_load -w $ARG1$ -c $ARG2$ #command[check_disk]=/usr/lib/nagios/plugins/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$ #command[check_procs]=/usr/lib/nagios/plugins/check_procs -w $ARG1$ -c $ARG2$ -s $ARG3$ command[restart_http]=/usr/lib/nagios/plugins/http-restart # # local configuration: # if you'd prefer, you can instead place directives here include=/etc/nagios/nrpe_local.cfg # # you can place your config snipplets into nrpe.d/ include_dir=/etc/nagios/nrpe.d/ My Sudoers files # /etc/sudoers # # This file MUST be edited with the 'visudo' command as root. # # See the man page for details on how to write a sudoers file. # Defaults env_reset # Host alias specification # User alias specification # Cmnd alias specification # User privilege specification root ALL=(ALL) ALL nagios ALL=(ALL) NOPASSWD: /usr/lib/nagios/plugins/ # Allow members of group sudo to execute any command # (Note that later entries override this, so you might need to move # it further down) %sudo ALL=(ALL) ALL # #includedir /etc/sudoers.d Hopefully someone can help!

Read the article
Failing Sata HDD

- by DaveCol

I think my HDD is fried... Could someone confirm or help me restore it? I was using Hardware RAID 1 Configuration [2 x 160GB SATA HDD] on a CentOS 4 Installation. All of a sudden I started seeing bad sectors on the second HDD which stopped being mirrored. I have removed the RAID array and have tested with SMART which showed the following error: 187 Unknown_Attribute 0x003a 001 001 051 Old_age Always FAILING_NOW 4645 I have no clue what this means, or if I can recover from it. Could someone give me some ideas on how to fix this, or what HDD to get to replace this? Complete SMART report: Smartctl version 5.33 [i686-redhat-linux-gnu] Copyright (C) 2002-4 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: GB0160CAABV Serial Number: 6RX58NAA Firmware Version: HPG1 User Capacity: 160,041,885,696 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 7 ATA Standard is: ATA/ATAPI-7 T13 1532D revision 4a Local Time is: Tue Oct 19 13:42:42 2010 COT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED See vendor-specific Attribute list for marginal Attributes. General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 433) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 54) minutes. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 100 253 006 Pre-fail Always - 0 3 Spin_Up_Time 0x0002 097 097 000 Old_age Always - 0 4 Start_Stop_Count 0x0033 100 100 020 Pre-fail Always - 152 5 Reallocated_Sector_Ct 0x0033 095 095 036 Pre-fail Always - 214 7 Seek_Error_Rate 0x000f 078 060 030 Pre-fail Always - 73109713 9 Power_On_Hours 0x0032 083 083 000 Old_age Always - 15133 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0033 100 100 020 Pre-fail Always - 154 184 Unknown_Attribute 0x0032 038 038 000 Old_age Always - 62 187 Unknown_Attribute 0x003a 001 001 051 Old_age Always FAILING_NOW 4645 189 Unknown_Attribute 0x0022 100 100 000 Old_age Always - 0 190 Unknown_Attribute 0x001a 061 055 000 Old_age Always - 656408615 194 Temperature_Celsius 0x0000 039 045 000 Old_age Offline - 39 (Lifetime Min/Max 0/22) 195 Hardware_ECC_Recovered 0x0032 070 059 000 Old_age Always - 12605265 197 Current_Pending_Sector 0x0000 100 100 000 Old_age Offline - 1 198 Offline_Uncorrectable 0x0000 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0000 200 200 000 Old_age Offline - 62 SMART Error Log Version: 1 ATA Error Count: 4645 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 4645 occurred at disk power-on lifetime: 15132 hours (630 days + 12 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 7b 86 b1 ea Error: UNC at LBA = 0x0ab1867b = 179406459 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 02 7b 86 b1 ea 00 00:38:52.796 READ DMA ec 03 45 00 00 00 a0 00 00:38:52.796 IDENTIFY DEVICE ef 03 45 00 00 00 a0 00 00:38:52.794 SET FEATURES [Set transfer mode] ec 00 00 7b 86 b1 a0 00 00:38:49.991 IDENTIFY DEVICE c8 00 04 79 86 b1 ea 00 00:38:49.935 READ DMA Error 4644 occurred at disk power-on lifetime: 15132 hours (630 days + 12 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 7b 86 b1 ea Error: UNC at LBA = 0x0ab1867b = 179406459 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 04 79 86 b1 ea 00 00:38:41.517 READ DMA ec 03 45 00 00 00 a0 00 00:38:41.515 IDENTIFY DEVICE ef 03 45 00 00 00 a0 00 00:38:41.515 SET FEATURES [Set transfer mode] ec 00 00 7b 86 b1 a0 00 00:38:49.991 IDENTIFY DEVICE c8 00 06 77 86 b1 ea 00 00:38:49.935 READ DMA Error 4643 occurred at disk power-on lifetime: 15132 hours (630 days + 12 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 7b 86 b1 ea Error: UNC at LBA = 0x0ab1867b = 179406459 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 06 77 86 b1 ea 00 00:38:41.517 READ DMA ec 03 45 00 00 00 a0 00 00:38:41.515 IDENTIFY DEVICE ef 03 45 00 00 00 a0 00 00:38:41.515 SET FEATURES [Set transfer mode] ec 00 00 7b 86 b1 a0 00 00:38:41.513 IDENTIFY DEVICE c8 00 06 77 86 b1 ea 00 00:38:38.706 READ DMA Error 4642 occurred at disk power-on lifetime: 15132 hours (630 days + 12 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 7b 86 b1 ea Error: UNC at LBA = 0x0ab1867b = 179406459 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 06 77 86 b1 ea 00 00:38:41.517 READ DMA ec 03 45 00 00 00 a0 00 00:38:41.515 IDENTIFY DEVICE ef 03 45 00 00 00 a0 00 00:38:41.515 SET FEATURES [Set transfer mode] ec 00 00 7b 86 b1 a0 00 00:38:41.513 IDENTIFY DEVICE c8 00 06 77 86 b1 ea 00 00:38:38.706 READ DMA Error 4641 occurred at disk power-on lifetime: 15132 hours (630 days + 12 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 7b 86 b1 ea Error: UNC at LBA = 0x0ab1867b = 179406459 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 06 77 86 b1 ea 00 00:38:41.517 READ DMA ec 03 45 00 00 00 a0 00 00:38:41.515 IDENTIFY DEVICE ef 03 45 00 00 00 a0 00 00:38:41.515 SET FEATURES [Set transfer mode] ec 00 00 7b 86 b1 a0 00 00:38:41.513 IDENTIFY DEVICE c8 00 06 77 86 b1 ea 00 00:38:38.706 READ DMA SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 15131 - # 2 Short offline Completed without error 00% 15131 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.

Read the article
Xenserver 5.5 U2 a bit unstable with an unstable W2003 VM

- by twistedbrain

In the last week I had to reboot the host system twice and the second one by means of the power button. The system is a Dell PE 6950 (4 Opteron dual core, 2,8 Ghz, 16 GB RAM, 900 GB disk in a RAID 10 array composed by 4 450GB 15000 rpm disks) with XenServer 5.5 U2. We're installing it and at now there are working in production a 2003 32 bit Windows server VM with 2 GB RAM, 3 vcpu and about 200 GB disk in 4 partition (12 GB boot, 20 GB program, 80 GB user data, 80 GB other data). The first time I was compressing many Windows 2003 folders (some tens GB by means of W2003 compressed folder option) from a Windows remote console and some hours before my colleague installed Backup Exec agent that was alredy installed and that required a reboot (that was pending). The console stopped responding, it was no more possible to connect by means of remote console or by means of the console of the XenCentre, it was still possible from the network to use the shared folder of the VM and the programs on it (2 db and a GIS program), but the print server didn't work any more and I couldn't give remote reboot from other domain controller hosts. I couldn't stop the virtual machine neither from the XenCentre, neither from the command line of the host also forcing the reboot. I had to reboot the host server. Yesterday it has been worst. I installed a template of another VM, CentOS 5.3 and then, put the DVD in the drive of the host. Then, before the install and after the boot I checked for defect the DVD and the W2003 VM began to respond slowly (I was connected by means of an administration remote console) the task manager showed only mid or low load, it is, only the first of the 3 vcpu was loaded (about 70%), while the other two were about at 20% and also the disk I/O was not so heavy. Then the users were not so happy because they couldn't any more use the MS word docs on the server. I immediately stopped the check of the DVD (to do that I had to force the stop of such Centos 5.3 VM), then some users could again use their docs, but other had still problems, so I decided to reboot the VM, but it doesn't stopped, neither from XenCenter, neither from command line, neither forcing the reboot. Then I tried to reboot of the host, but it didn't worked, neither from the XenCentre, neither from the host prompt (shutdown -r now as root: it told that it was shutting down, but then it didn't did that). So I had to power off by means of the power button of the server (before I tried some Magic SyS REQ, but I saw that Xen isn't compiled with this option enabled). What could you suggest about my problem, what can I look, search and see? In the W2003 VM logs there are no errors or warning to explain what happened. Some more exciting, amusing and inspiring words of poetry (the facts happened around 11 am): \# egrep -i 'err|warning' xensource.log [20100219 10:32:05.597|debug|culo|6301 unix-RPC|VBD.plug R:c81bcda701f6|xenops] watch: watching xenstore paths: [ /xapi/0/frontend/vbd/51712/hotplug; /local/domain/0/backend/vbd/0/51712/tapdisk-error ] with timeout 1200.000000 seconds [20100219 10:32:05.597|debug|culo|6301 unix-RPC|VBD.plug R:c81bcda701f6|xenops] watch: fired on /local/domain/0/backend/vbd/0/51712/tapdisk-error [20100219 10:32:14.314|debug|culo|6335 unix-RPC|VBD.unplug R:9258f54578d6|xenops] watch: watching xenstore paths: [ /local/domain/0/backend/vbd/0/51712/shutdown-done; /local/domain/0/error/device/vbd/51712/error ] with timeout 1200.000000 seconds [20100219 10:32:14.337|debug|culo|6335 unix-RPC|VBD.unplug R:9258f54578d6|xenops] xenstore-rm /local/domain/0/error/backend/vbd/0 [20100219 10:32:14.337|debug|culo|6335 unix-RPC|VBD.unplug R:9258f54578d6|xenops] xenstore-rm /local/domain/0/error/device/vbd/51712 [20100219 10:32:14.338|debug|culo|6335 unix-RPC|VBD.unplug R:9258f54578d6|xenops] watch: fired on /local/domain/0/error/device/vbd/51712/error [20100219 10:53:48.903|debug|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|helpers] Ignoring exception: INTERNAL_ERROR: [ Xb.Noent ] while Vmops.destroy_domain: Destroying domid 14 guest session [20100219 10:53:52.048|debug|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|xenops] xenstore-rm /local/domain/0/error/backend/tap/14 [20100219 10:53:52.048|debug|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|xenops] xenstore-rm /local/domain/14/error/device/vbd/51744 [20100219 10:53:52.085|debug|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|xenops] xenstore-rm /local/domain/0/error/backend/tap/14 [20100219 10:53:52.086|debug|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|xenops] xenstore-rm /local/domain/14/error/device/vbd/51728 [20100219 10:53:52.122|debug|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|xenops] xenstore-rm /local/domain/0/error/backend/tap/14 [20100219 10:53:52.122|debug|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|xenops] xenstore-rm /local/domain/14/error/device/vbd/51712 [20100219 10:53:52.127|debug|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|xenops] xenstore-rm /local/domain/0/error/backend/vbd/14 [20100219 10:53:52.128|debug|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|xenops] xenstore-rm /local/domain/14/error/device/vbd/51760 [20100219 10:53:52.496|debug|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|xenops] Device.Vif.hard_shutdown about to blow away backend and error paths [20100219 10:53:52.497|debug|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|xenops] xenstore-rm /local/domain/0/error/backend/vif/14 [20100219 10:53:52.497|debug|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|xenops] xenstore-rm /local/domain/14/error/device/vif/0 [20100219 10:53:53.385|debug|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|xenops] Device.Vif.hard_shutdown about to blow away backend and error paths [20100219 10:53:53.386|debug|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|xenops] xenstore-rm /local/domain/0/error/backend/vif/14 [20100219 10:53:53.386|debug|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|xenops] xenstore-rm /local/domain/14/error/device/vif/1 [20100219 10:53:53.389| warn|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|hotplug] Warning, deleting 'vif' entry from /xapi/14/hotplug/vif/0 [20100219 10:53:53.391| warn|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|hotplug] Warning, deleting 'vif' entry from /xapi/14/hotplug/vif/1 [20100219 11:20:49.766|debug|culo|6484|Async.VM.clean_shutdown R:78d3c3e28cb6|helpers] Ignoring exception: INTERNAL_ERROR: [ Xb.Noent ] while Vmops.destroy_domain: Destroying domid 11 guest session [20100219 11:20:50.339|debug|culo|6484|Async.VM.clean_shutdown R:78d3c3e28cb6|xenops] xenstore-rm /local/domain/0/error/backend/tap/11 [20100219 11:20:50.339|debug|culo|6484|Async.VM.clean_shutdown R:78d3c3e28cb6|xenops] xenstore-rm /local/domain/11/error/device/vbd/832 [20100219 11:20:50.360|debug|culo|6484|Async.VM.clean_shutdown R:78d3c3e28cb6|xenops] xenstore-rm /local/domain/0/error/backend/tap/11 [20100219 11:20:50.360|debug|culo|6484|Async.VM.clean_shutdown R:78d3c3e28cb6|xenops] xenstore-rm /local/domain/11/error/device/vbd/768 [20100219 11:20:50.366|debug|culo|6484|Async.VM.clean_shutdown R:78d3c3e28cb6|xenops] xenstore-rm /local/domain/0/error/backend/vbd/11 [20100219 11:20:50.366|debug|culo|6484|Async.VM.clean_shutdown R:78d3c3e28cb6|xenops] xenstore-rm /local/domain/11/error/device/vbd/5696 [20100219 11:20:50.753|debug|culo|6484|Async.VM.clean_shutdown R:78d3c3e28cb6|xenops] Device.Vif.hard_shutdown about to blow away backend and error paths [20100219 11:20:50.754|debug|culo|6484|Async.VM.clean_shutdown R:78d3c3e28cb6|xenops] xenstore-rm /local/domain/0/error/backend/vif/11 [20100219 11:20:50.754|debug|culo|6484|Async.VM.clean_shutdown R:78d3c3e28cb6|xenops] xenstore-rm /local/domain/11/error/device/vif/1 [20100219 11:20:50.757| warn|culo|6484|Async.VM.clean_shutdown R:78d3c3e28cb6|hotplug] Warning, deleting 'vif' entry from /xapi/11/hotplug/vif/1 [20100219 11:28:13.803|debug|culo|6610 inet-RPC|Connection to VM console R:e9f8b76e8975|console] error: INTERNAL_ERROR: [ Unix.Unix_error(63, "connect", "") ]

Read the article
Hard Disk DRDY error: is it a crash

- by pranjal

I am using IBM Thinkpad, 1.7GHz, 512 RAM with Linux Mint 9 installed. I have two partitions in addition to root. One of the partitions became read-only yesterday, after which I rebooted my system. It is extremely slow along with DRDY Error : Is my Hard disk crashed ? Error Log while booting. Differences between boot sector and its backup. failed command : READ DMA BMDMA : stat 0X25 ata 1.00 : status : { DRDY ERR } ata 1.00 : status :{ UNC } Buffer I/O error on logical device, logical block 65467 smartctl output for the partition: mint mint # smartctl -a /dev/sda1 smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: TOSHIBA MK4026GAX RoHS Serial Number: X5LY1623T Firmware Version: PA107E User Capacity: 40,007,761,920 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 6 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Thu Feb 17 06:48:25 2011 UTC SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x84) Offline data collection activity was suspended by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 153) seconds. Offline data collection capabilities: (0x1b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. No Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. No General Purpose Logging support. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 30) minutes. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 100 100 050 Pre-fail Always - 0 2 Throughput_Performance 0x0005 100 100 050 Pre-fail Offline - 0 3 Spin_Up_Time 0x0027 100 100 001 Pre-fail Always - 310 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 3968 5 Reallocated_Sector_Ct 0x0033 100 100 050 Pre-fail Always - 40 7 Seek_Error_Rate 0x000b 100 100 050 Pre-fail Always - 0 8 Seek_Time_Performance 0x0005 100 100 050 Pre-fail Offline - 0 9 Power_On_Hours 0x0032 082 082 000 Old_age Always - 7257 10 Spin_Retry_Count 0x0033 179 100 030 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 3484 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 489 193 Load_Cycle_Count 0x0032 064 064 000 Old_age Always - 367150 194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 36 (Lifetime Min/Max 14/57) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 33 197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 82 198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 1 199 UDMA_CRC_Error_Count 0x0032 200 253 000 Old_age Always - 0 220 Disk_Shift 0x0002 100 100 000 Old_age Always - 101 222 Loaded_Hours 0x0032 085 085 000 Old_age Always - 6146 223 Load_Retry_Count 0x0032 100 100 000 Old_age Always - 0 224 Load_Friction 0x0022 100 100 000 Old_age Always - 0 226 Load-in_Time 0x0026 100 100 000 Old_age Always - 227 240 Head_Flying_Hours 0x0001 100 100 001 Pre-fail Offline - 0 SMART Error Log Version: 1 ATA Error Count: 2371 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 2371 occurred at disk power-on lifetime: 7256 hours (302 days + 8 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 05 1a 1b 00 e0 Error: UNC 5 sectors at LBA = 0x00001b1a = 6938 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 05 1a 1b 00 e0 00 00:03:10.061 READ DMA f8 00 00 00 00 00 e0 00 00:03:10.061 READ NATIVE MAX ADDRESS ec 00 00 00 00 00 a0 02 00:03:10.053 IDENTIFY DEVICE ef 03 45 00 00 00 a0 02 00:03:10.053 SET FEATURES [Set transfer mode] f8 00 00 00 00 00 e0 00 00:03:10.053 READ NATIVE MAX ADDRESS Error 2370 occurred at disk power-on lifetime: 7256 hours (302 days + 8 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 05 1a 1b 00 e0 Error: UNC 5 sectors at LBA = 0x00001b1a = 6938 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 05 1a 1b 00 e0 00 00:03:03.328 READ DMA f8 00 00 00 00 00 e0 00 00:03:03.327 READ NATIVE MAX ADDRESS ec 00 00 00 00 00 a0 02 00:03:03.320 IDENTIFY DEVICE ef 03 45 00 00 00 a0 02 00:03:03.319 SET FEATURES [Set transfer mode] f8 00 00 00 00 00 e0 00 00:03:03.319 READ NATIVE MAX ADDRESS Error 2369 occurred at disk power-on lifetime: 7256 hours (302 days + 8 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 05 1a 1b 00 e0 Error: UNC 5 sectors at LBA = 0x00001b1a = 6938 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 05 1a 1b 00 e0 00 00:02:56.582 READ DMA f8 00 00 00 00 00 e0 00 00:02:56.582 READ NATIVE MAX ADDRESS ec 00 00 00 00 00 a0 02 00:02:56.574 IDENTIFY DEVICE ef 03 45 00 00 00 a0 02 00:02:56.574 SET FEATURES [Set transfer mode] f8 00 00 00 00 00 e0 00 00:02:56.574 READ NATIVE MAX ADDRESS Error 2368 occurred at disk power-on lifetime: 7256 hours (302 days + 8 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 05 1a 1b 00 e0 Error: UNC 5 sectors at LBA = 0x00001b1a = 6938 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 05 1a 1b 00 e0 00 00:02:49.809 READ DMA f8 00 00 00 00 00 e0 00 00:02:49.809 READ NATIVE MAX ADDRESS ec 00 00 00 00 00 a0 02 00:02:49.801 IDENTIFY DEVICE ef 03 45 00 00 00 a0 02 00:02:49.801 SET FEATURES [Set transfer mode] f8 00 00 00 00 00 e0 00 00:02:49.801 READ NATIVE MAX ADDRESS Error 2367 occurred at disk power-on lifetime: 7256 hours (302 days + 8 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 05 1a 1b 00 e0 Error: UNC 5 sectors at LBA = 0x00001b1a = 6938 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 05 1a 1b 00 e0 00 00:02:43.056 READ DMA f8 00 00 00 00 00 e0 00 00:02:43.056 READ NATIVE MAX ADDRESS ec 00 00 00 00 00 a0 02 00:02:43.048 IDENTIFY DEVICE ef 03 45 00 00 00 a0 02 00:02:43.048 SET FEATURES [Set transfer mode] f8 00 00 00 00 00 e0 00 00:02:43.047 READ NATIVE MAX ADDRESS SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] Device does not support Selective Self Tests/Logging Do I need to get a new Hard Disk my PC ?

Read the article
solved: puppet master REST API returns 403 when running under passenger works when master runs from command line

- by Anadi Misra

I am using the standard auth.conf provided in puppet install for the puppet master which is running through passenger under Nginx. However for most of the catalog, files and certitifcate request I get a 403 response. ### Authenticated paths - these apply only when the client ### has a valid certificate and is thus authenticated # allow nodes to retrieve their own catalog path ~ ^/catalog/([^/]+)$ method find allow $1 # allow nodes to retrieve their own node definition path ~ ^/node/([^/]+)$ method find allow $1 # allow all nodes to access the certificates services path ~ ^/certificate_revocation_list/ca method find allow * # allow all nodes to store their reports path /report method save allow * # unconditionally allow access to all file services # which means in practice that fileserver.conf will # still be used path /file allow * ### Unauthenticated ACL, for clients for which the current master doesn't ### have a valid certificate; we allow authenticated users, too, because ### there isn't a great harm in letting that request through. # allow access to the master CA path /certificate/ca auth any method find allow * path /certificate/ auth any method find allow * path /certificate_request auth any method find, save allow * path /facts auth any method find, search allow * # this one is not stricly necessary, but it has the merit # of showing the default policy, which is deny everything else path / auth any Puppet master however does not seems to be following this as I get this error on client [amisr1@blramisr195602 ~]$ sudo puppet agent --no-daemonize --verbose --server bangvmpllda02.XXXXX.com [sudo] password for amisr1: Starting Puppet client version 3.0.1 Warning: Unable to fetch my node definition, but the agent run will continue: Warning: Error 403 on SERVER: Forbidden request: XX.XXX.XX.XX(XX.XXX.XX.XX) access to /certificate_revocation_list/ca [find] at :110 Info: Retrieving plugin Error: /File[/var/lib/puppet/lib]: Failed to generate additional resources using 'eval_generate: Error 403 on SERVER: Forbidden request: XX.XXX.XX.XX(XX.XXX.XX.XX) access to /file_metadata/plugins [search] at :110 Error: /File[/var/lib/puppet/lib]: Could not evaluate: Error 403 on SERVER: Forbidden request: XX.XXX.XX.XX(XX.XXX.XX.XX) access to /file_metadata/plugins [find] at :110 Could not retrieve file metadata for puppet://devops.XXXXX.com/plugins: Error 403 on SERVER: Forbidden request: XX.XXX.XX.XX(XX.XXX.XX.XX) access to /file_metadata/plugins [find] at :110 Error: Could not retrieve catalog from remote server: Error 403 on SERVER: Forbidden request: XX.XXX.XX.XX(XX.XXX.XX.XX) access to /catalog/blramisr195602.XXXXX.com [find] at :110 Using cached catalog Error: Could not retrieve catalog; skipping run Error: Could not send report: Error 403 on SERVER: Forbidden request: XX.XXX.XX.XX(XX.XXX.XX.XX) access to /report/blramisr195602.XXXXX.com [save] at :110 and the server logs show XX.XXX.XX.XX - - [10/Dec/2012:14:46:52 +0530] "GET /production/certificate_revocation_list/ca? HTTP/1.1" 403 102 "-" "Ruby" XX.XXX.XX.XX - - [10/Dec/2012:14:46:52 +0530] "GET /production/file_metadatas/plugins?links=manage&recurse=true&&ignore=---+%0A++-+%22.svn%22%0A++-+CVS%0A++-+%22.git%22&checksum_type=md5 HTTP/1.1" 403 95 "-" "Ruby" XX.XXX.XX.XX - - [10/Dec/2012:14:46:52 +0530] "GET /production/file_metadata/plugins? HTTP/1.1" 403 93 "-" "Ruby" XX.XXX.XX.XX - - [10/Dec/2012:14:46:53 +0530] "POST /production/catalog/blramisr195602.XXXXX.com HTTP/1.1" 403 106 "-" "Ruby" XX.XXX.XX.XX - - [10/Dec/2012:14:46:53 +0530] "PUT /production/report/blramisr195602.XXXXX.com HTTP/1.1" 403 105 "-" "Ruby" thefile server conf file is as follows (and goin by what they say on puppet site, It is better to regulate access in auth.conf for reaching file server and then allow file server to server all) [files] path /apps/puppet/files allow * [private] path /apps/puppet/private/%H allow * [modules] allow * I am using server and client version 3 Nginx has been compiled using the following options nginx version: nginx/1.3.9 built by gcc 4.4.6 20120305 (Red Hat 4.4.6-4) (GCC) TLS SNI support enabled configure arguments: --prefix=/apps/nginx --conf-path=/apps/nginx/nginx.conf --pid-path=/apps/nginx/run/nginx.pid --error-log-path=/apps/nginx/logs/error.log --http-log-path=/apps/nginx/logs/access.log --with-http_ssl_module --with-http_gzip_static_module --add-module=/usr/lib/ruby/gems/1.8/gems/passenger-3.0.18/ext/nginx --add-module=/apps/Downloads/nginx/nginx-auth-ldap-master/ and the standard nginx puppet master conf server { ssl on; listen 8140 ssl; server_name _; passenger_enabled on; passenger_set_cgi_param HTTP_X_CLIENT_DN $ssl_client_s_dn; passenger_set_cgi_param HTTP_X_CLIENT_VERIFY $ssl_client_verify; passenger_min_instances 5; access_log logs/puppet_access.log; error_log logs/puppet_error.log; root /apps/nginx/html/rack/public; ssl_certificate /var/lib/puppet/ssl/certs/bangvmpllda02.XXXXXX.com.pem; ssl_certificate_key /var/lib/puppet/ssl/private_keys/bangvmpllda02.XXXXXX.com.pem; ssl_crl /var/lib/puppet/ssl/ca/ca_crl.pem; ssl_client_certificate /var/lib/puppet/ssl/certs/ca.pem; ssl_ciphers SSLv2:-LOW:-EXPORT:RC4+RSA; ssl_prefer_server_ciphers on; ssl_verify_client optional; ssl_verify_depth 1; ssl_session_cache shared:SSL:128m; ssl_session_timeout 5m; } Puppet is picking up the correct settings from the files mentioned because config print command points to /etc/puppet [amisr1@bangvmpllDA02 puppet]$ sudo puppet config print | grep conf async_storeconfigs = false authconfig = /etc/puppet/namespaceauth.conf autosign = /etc/puppet/autosign.conf catalog_cache_terminus = store_configs confdir = /etc/puppet config = /etc/puppet/puppet.conf config_file_name = puppet.conf config_version = "" configprint = all configtimeout = 120 dblocation = /var/lib/puppet/state/clientconfigs.sqlite3 deviceconfig = /etc/puppet/device.conf fileserverconfig = /etc/puppet/fileserver.conf genconfig = false hiera_config = /etc/puppet/hiera.yaml localconfig = /var/lib/puppet/state/localconfig name = config rest_authconfig = /etc/puppet/auth.conf storeconfigs = true storeconfigs_backend = puppetdb tagmap = /etc/puppet/tagmail.conf thin_storeconfigs = false I checked the firewall rules on this VM; 80, 443, 8140, 3000 are allowed. Do I still have to tweak any specifics to auth.conf for getting this to work? Update I added verbose logging to the puppet master and restarted nginx; here's the additional info I see in logs Mon Dec 10 18:19:15 +0530 2012 Puppet (err): Could not resolve 10.209.47.31: no name for 10.209.47.31 Mon Dec 10 18:19:15 +0530 2012 access[/] (info): defaulting to no access for 10.209.47.31 Mon Dec 10 18:19:15 +0530 2012 Puppet (warning): Denying access: Forbidden request: 10.209.47.31(10.209.47.31) access to /file_metadata/plugins [find] at :111 Mon Dec 10 18:19:15 +0530 2012 Puppet (err): Forbidden request: 10.209.47.31(10.209.47.31) access to /file_metadata/plugins [find] at :111 10.209.47.31 - - [10/Dec/2012:18:19:15 +0530] "GET /production/file_metadata/plugins? HTTP/1.1" 403 93 "-" "Ruby" On the agent machine facter fqdn and hostname both return a fully qualified host name [amisr1@blramisr195602 ~]$ sudo facter fqdn blramisr195602.XXXXXXX.com I then updated the agent configuration to add dns_alt_names = 10.209.47.31 cleaned all certificates on master and agent and regenerated the certificates and signed them on master using the option --allow-dns-alt-names [amisr1@bangvmpllDA02 ~]$ sudo puppet cert sign blramisr195602.XXXXXX.com Error: CSR 'blramisr195602.XXXXXX.com' contains subject alternative names (DNS:10.209.47.31, DNS:blramisr195602.XXXXXX.com), which are disallowed. Use `puppet cert --allow-dns-alt-names sign blramisr195602.XXXXXX.com` to sign this request. [amisr1@bangvmpllDA02 ~]$ sudo puppet cert --allow-dns-alt-names sign blramisr195602.XXXXXX.com Signed certificate request for blramisr195602.XXXXXX.com Removing file Puppet::SSL::CertificateRequest blramisr195602.XXXXXX.com at '/var/lib/puppet/ssl/ca/requests/blramisr195602.XXXXXX.com.pem' however, that doesn't help either; I get same errors as before. Not sure why in the logs it shows comparing access rules by IP and not hostname. Is there any Nginx configuration to change this behavior?

Read the article
pasenger does not start puppet master under nginx

- by Anadi Misra

On the server [root@bangvmpllDA02 logs]# ruby -v ruby 1.8.7 (2011-06-30 patchlevel 352) [x86_64-linux] [root@bangvmpllDA02 logs]# puppet --version 3.0.1 and [root@bangvmpllDA02 logs]# service nginx configtest nginx: the configuration file /apps/nginx/nginx.conf syntax is ok nginx: configuration file /apps/nginx/nginx.conf test is successful [root@bangvmpllDA02 logs]# service nginx status nginx (pid 25923 25921 25920 25917 25908) is running... [root@bangvmpllDA02 logs]# however none of my agents are able to connect to the master, they all fail with errors like so [amisr1@blramisr195602 ~]$ puppet agent --test --verbose --server bangvmpllda02.XXX.com Info: Creating a new SSL certificate request for blramisr195602.XXX.com Info: Certificate Request fingerprint (SHA256): 26:EB:08:1F:82:32:E4:03:7A:64:8E:30:A3:99:93:26:E6:66:B9:B0:49:B6:08:F9:67:CA:1B:0C:00:B9:1D:41 Error: Could not request certificate: Error 405 on SERVER: <html> <head><title>405 Not Allowed</title></head> <body bgcolor="white"> <center><h1>405 Not Allowed</h1></center> <hr><center>nginx</center> </body> </html> Exiting; failed to retrieve certificate and waitforcert is disabled when I check logs on puppet master [root@bangvmpllDA02 logs]# tail puppet_access.log [05/Dec/2012:17:45:18 +0530] "GET /production/certificate/ca? HTTP/1.1" 404 162 "-" "Ruby" [05/Dec/2012:18:32:23 +0530] "PUT /production/certificate_request/sl63anadi.XXX.com HTTP/1.1" 405 166 "-" "-" [05/Dec/2012:18:33:33 +0530] "GET /production/certificate/sl63anadi.XXX.com? HTTP/1.1" 404 162 "-" "-" [05/Dec/2012:18:33:33 +0530] "GET /production/certificate_request/sl63anadi.XXX.com? HTTP/1.1" 404 162 "-" "-" [05/Dec/2012:18:33:33 +0530] "PUT /production/certificate_request/sl63anadi.XXX.com HTTP/1.1" 405 166 "-" "-" and the error logs show that nginx is not really able to process the request well 2012/12/05 18:33:33 [error] 25920#0: *23 open() "/etc/puppet/rack/public/production/certificate/sl63anadi.XXX.com" failed (2: No such file or directory), client: 10.209.47.26, server: , request: "GET /production/certificate/sl63anadi.XXX.com? HTTP/1.1", host: "bangvmpllda02.XXX.com:8140" 2012/12/05 18:33:33 [error] 25920#0: *24 open() "/etc/puppet/rack/public/production/certificate_request/sl63anadi.XXX.com" failed (2: No such file or directory), client: 10.209.47.26, server: , request: "GET /production/certificate_request/sl63anadi.XXX.com? HTTP/1.1", host: "bangvmpllda02.XXX.com:8140" 2012/12/05 18:47:56 [error] 25923#0: *27 open() "/etc/puppet/rack/public/production/certificate/ca" failed (2: No such file or directory), client: 10.209.47.31, server: , request: "GET /production/certificate/ca? HTTP/1.1", host: "bangvmpllda02.XXX.com:8140" 2012/12/05 18:47:56 [error] 25923#0: *28 open() "/etc/puppet/rack/public/production/certificate_request/blramisr195602.XXX.com" failed (2: No such file or directory), client: 10.209.47.31, server: , request: "GET /production/certificate_request/blramisr195602.XXX.com? HTTP/1.1", host: "bangvmpllda02.XXX.com:8140" Passenger does not show any application groups either [root@bangvmpllDA02 nginx]# passenger-status ----------- General information ----------- max = 15 count = 0 active = 0 inactive = 0 Waiting on global queue: 0 ----------- Application groups ----------- [root@bangvmpllDA02 nginx]# here's my nginx configuration [root@bangvmpllDA02 logs]# cat ../nginx.conf user puppet; worker_processes 4; #error_log logs/error.log; #error_log logs/error.log notice; error_log logs/error.log info; #pid logs/nginx.pid; events { use epoll; worker_connections 1024; } http { include mime.types; default_type application/octet-stream; log_format main '$remote_addr - $remote_user [$time_local] "$request" ' '$status $body_bytes_sent "$http_referer" ' '"$http_user_agent" "$http_x_forwarded_for"'; access_log logs/access.log main; sendfile on; #tcp_nopush on; server_tokens off; #keepalive_timeout 0; keepalive_timeout 120; gzip on; gzip_http_version 1.1; gzip_disable "msie6"; gzip_vary on; gzip_min_length 1100; gzip_buffers 64 8k; gzip_comp_level 3; gzip_proxied any; gzip_types text/plain text/css application/x-javascript text/xml application/xml; server { listen 80; server_name bangvmpllda02.XXXX.com; charset utf-8; #access_log logs/http.access.log main; location / { root html; index index.html index.htm index.php; } #error_page 404 /404.html; # redirect server error pages to the static page /50x.html # error_page 500 502 503 504 /50x.html; location = /50x.html { root html; } # proxy the PHP scripts to Apache listening on 127.0.0.1:80 # #location ~ \.php$ { # proxy_pass http://127.0.0.1; #} # pass the PHP scripts to FastCGI server listening on 127.0.0.1:9000 # location ~ \.php$ { root html; fastcgi_pass unix:/var/run/php-fpm/php-fpm.sock; fastcgi_index index.php; fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name; fastcgi_param SCRIPT_NAME $fastcgi_script_name; include fastcgi_params; } # deny access to .htaccess files, if Apache's document root # concurs with nginx's one # location ~ /\.ht { access_log off; log_not_found off; deny all; } location ~* \.(jpg|jpeg|gif|png|css|js|ico|xml)$ { access_log off; log_not_found off; expires 2d; } } # Passenger needed for puppet passenger_root /usr/lib/ruby/gems/1.8/gems/passenger-3.0.18; passenger_ruby /usr/bin/ruby; passenger_max_pool_size 15; server { ssl on; listen 8140 default ssl; server_name bangvmpllda02.XXXX.com; passenger_enabled on; passenger_set_cgi_param HTTP_X_CLIENT_DN $ssl_client_s_dn; passenger_set_cgi_param HTTP_X_CLIENT_VERIFY $ssl_client_verify; passenger_min_instances 5; access_log logs/puppet_access.log; error_log logs/puppet_error.log; root /etc/puppet/rack/public; ssl_certificate /var/lib/puppet/ssl/certs/bangvmpllda02.XXX.com.pem; ssl_certificate_key /var/lib/puppet/ssl/private_keys/bangvmpllda02.XXX.com.pem; ssl_crl /var/lib/puppet/ssl/ca/ca_crl.pem; ssl_client_certificate /var/lib/puppet/ssl/certs/ca.pem; ssl_ciphers SSLv2:-LOW:-EXPORT:RC4+RSA; ssl_prefer_server_ciphers on; ssl_verify_client optional; ssl_verify_depth 1; ssl_session_cache shared:SSL:128m; ssl_session_timeout 5m; } } and the puppet.conf [main] # The Puppet log directory. # The default value is '$vardir/log'. logdir = /var/log/puppet # Where Puppet PID files are kept. # The default value is '$vardir/run'. rundir = /var/run/puppet dns_alt_names = devops.XXXX.com,devops confdir = /etc/puppet vardir = /var/lib/puppet storeconfigs = true storeconfigs_backend = puppetdb thin_storeconfigs = false async_storeconfigs = false ssl_client_header = SSL_CLIENT_S_D ssl_client_verify_header = SSL_CLIENT_VERIFY # Where SSL certificates are kept. # The default value is '$confdir/ssl'. ssldir = $vardir/ssl any ideas where am I going wrong? I checkthe directory permissions; /usr/share/puppet, /etc/puppet and /var/lib/puppet (and files inside them) are owned by puppet user.

Read the article
Strange Recurrent Excessive I/O Wait

- by Chris

I know quite well that I/O wait has been discussed multiple times on this site, but all the other topics seem to cover constant I/O latency, while the I/O problem we need to solve on our server occurs at irregular (short) intervals, but is ever-present with massive spikes of up to 20k ms a-wait and service times of 2 seconds. The disk affected is /dev/sdb (Seagate Barracuda, for details see below). A typical iostat -x output would at times look like this, which is an extreme sample but by no means rare: iostat (Oct 6, 2013) tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 16.00 0.00 156.00 9.75 21.89 288.12 36.00 57.60 5.50 0.00 44.00 8.00 48.79 2194.18 181.82 100.00 2.00 0.00 16.00 8.00 46.49 3397.00 500.00 100.00 4.50 0.00 40.00 8.89 43.73 5581.78 222.22 100.00 14.50 0.00 148.00 10.21 13.76 5909.24 68.97 100.00 1.50 0.00 12.00 8.00 8.57 7150.67 666.67 100.00 0.50 0.00 4.00 8.00 6.31 10168.00 2000.00 100.00 2.00 0.00 16.00 8.00 5.27 11001.00 500.00 100.00 0.50 0.00 4.00 8.00 2.96 17080.00 2000.00 100.00 34.00 0.00 1324.00 9.88 1.32 137.84 4.45 59.60 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 22.00 44.00 204.00 11.27 0.01 0.27 0.27 0.60 Let me provide you with some more information regarding the hardware. It's a Dell 1950 III box with Debian as OS where uname -a reports the following: Linux xx 2.6.32-5-amd64 #1 SMP Fri Feb 15 15:39:52 UTC 2013 x86_64 GNU/Linux The machine is a dedicated server that hosts an online game without any databases or I/O heavy applications running. The core application consumes about 0.8 of the 8 GBytes RAM, and the average CPU load is relatively low. The game itself, however, reacts rather sensitive towards I/O latency and thus our players experience massive ingame lag, which we would like to address as soon as possible. iostat: avg-cpu: %user %nice %system %iowait %steal %idle 1.77 0.01 1.05 1.59 0.00 95.58 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sdb 13.16 25.42 135.12 504701011 2682640656 sda 1.52 0.74 20.63 14644533 409684488 Uptime is: 19:26:26 up 229 days, 17:26, 4 users, load average: 0.36, 0.37, 0.32 Harddisk controller: 01:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 1078 (rev 04) Harddisks: Array 1, RAID-1, 2x Seagate Cheetah 15K.5 73 GB SAS Array 2, RAID-1, 2x Seagate ST3500620SS Barracuda ES.2 500GB 16MB 7200RPM SAS Partition information from df: Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdb1 480191156 30715200 425083668 7% /home /dev/sda2 7692908 437436 6864692 6% / /dev/sda5 15377820 1398916 13197748 10% /usr /dev/sda6 39159724 19158340 18012140 52% /var Some more data samples generated with iostat -dx sdb 1 (Oct 11, 2013) Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sdb 0.00 15.00 0.00 70.00 0.00 656.00 9.37 4.50 1.83 4.80 33.60 sdb 0.00 0.00 0.00 2.00 0.00 16.00 8.00 12.00 836.00 500.00 100.00 sdb 0.00 0.00 0.00 3.00 0.00 32.00 10.67 9.96 1990.67 333.33 100.00 sdb 0.00 0.00 0.00 4.00 0.00 40.00 10.00 6.96 3075.00 250.00 100.00 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 4.00 0.00 0.00 100.00 sdb 0.00 0.00 0.00 2.00 0.00 16.00 8.00 2.62 4648.00 500.00 100.00 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.00 0.00 0.00 100.00 sdb 0.00 0.00 0.00 1.00 0.00 16.00 16.00 1.69 7024.00 1000.00 100.00 sdb 0.00 74.00 0.00 124.00 0.00 1584.00 12.77 1.09 67.94 6.94 86.00 Characteristic charts generated with rrdtool can be found here: iostat plot 1, 24 min interval: http://imageshack.us/photo/my-images/600/yqm3.png/ iostat plot 2, 120 min interval: http://imageshack.us/photo/my-images/407/griw.png/ As we have a rather large cache of 5.5 GBytes, we thought it might be a good idea to test if the I/O wait spikes would perhaps be caused by cache miss events. Therefore, we did a sync and then this to flush the cache and buffers: echo 3 > /proc/sys/vm/drop_caches and directly afterwards the I/O wait and service times virtually went through the roof, and everything on the machine felt like slow motion. During the next few hours the latency recovered and everything was as before - small to medium lags in short, unpredictable intervals. Now my question is: does anybody have any idea what might cause this annoying behaviour? Is it the first indication of the disk array or the raid controller dying, or something that can be easily mended by rebooting? (At the moment we're very reluctant to do this, however, because we're afraid that the disks might not come back up again.) Any help is greatly appreciated. Thanks in advance, Chris. Edited to add: we do see one or two processes go to 'D' state in top, one of which seems to be kjournald rather frequently. If I'm not mistaken, however, this does not indicate the processes causing the latency, but rather those affected by it - correct me if I'm wrong. Does the information about uninterruptibly sleeping processes help us in any way to address the problem? @Andy Shinn requested smartctl data, here it is: smartctl -a -d megaraid,2 /dev/sdb yields: smartctl 5.40 2010-07-12 r3124 [x86_64-unknown-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net Device: SEAGATE ST3500620SS Version: MS05 Serial number: Device type: disk Transport protocol: SAS Local Time is: Mon Oct 14 20:37:13 2013 CEST Device supports SMART and is Enabled Temperature Warning Disabled or Not Supported SMART Health Status: OK Current Drive Temperature: 20 C Drive Trip Temperature: 68 C Elements in grown defect list: 0 Vendor (Seagate) cache information Blocks sent to initiator = 1236631092 Blocks received from initiator = 1097862364 Blocks read from cache and sent to initiator = 1383620256 Number of read and write commands whose size <= segment size = 531295338 Number of read and write commands whose size > segment size = 51986460 Vendor (Seagate/Hitachi) factory information number of hours powered up = 36556.93 number of minutes until next internal SMART test = 32 Error counter log: Errors Corrected by Total Correction Gigabytes Total ECC rereads/ errors algorithm processed uncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 509271032 47 0 509271079 509271079 20981.423 0 write: 0 0 0 0 0 5022.039 0 verify: 1870931090 196 0 1870931286 1870931286 100558.708 0 Non-medium error count: 0 SMART Self-test log Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ] Description number (hours) # 1 Background short Completed 16 36538 - [- - -] # 2 Background short Completed 16 36514 - [- - -] # 3 Background short Completed 16 36490 - [- - -] # 4 Background short Completed 16 36466 - [- - -] # 5 Background short Completed 16 36442 - [- - -] # 6 Background long Completed 16 36420 - [- - -] # 7 Background short Completed 16 36394 - [- - -] # 8 Background short Completed 16 36370 - [- - -] # 9 Background long Completed 16 36364 - [- - -] #10 Background short Completed 16 36361 - [- - -] #11 Background long Completed 16 2 - [- - -] #12 Background short Completed 16 0 - [- - -] Long (extended) Self Test duration: 6798 seconds [113.3 minutes] smartctl -a -d megaraid,3 /dev/sdb yields: smartctl 5.40 2010-07-12 r3124 [x86_64-unknown-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net Device: SEAGATE ST3500620SS Version: MS05 Serial number: Device type: disk Transport protocol: SAS Local Time is: Mon Oct 14 20:37:26 2013 CEST Device supports SMART and is Enabled Temperature Warning Disabled or Not Supported SMART Health Status: OK Current Drive Temperature: 19 C Drive Trip Temperature: 68 C Elements in grown defect list: 0 Vendor (Seagate) cache information Blocks sent to initiator = 288745640 Blocks received from initiator = 1097848399 Blocks read from cache and sent to initiator = 1304149705 Number of read and write commands whose size <= segment size = 527414694 Number of read and write commands whose size > segment size = 51986460 Vendor (Seagate/Hitachi) factory information number of hours powered up = 36596.83 number of minutes until next internal SMART test = 28 Error counter log: Errors Corrected by Total Correction Gigabytes Total ECC rereads/ errors algorithm processed uncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 610862490 44 0 610862534 610862534 20470.133 0 write: 0 0 0 0 0 5022.480 0 verify: 2861227413 203 0 2861227616 2861227616 100872.443 0 Non-medium error count: 1 SMART Self-test log Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ] Description number (hours) # 1 Background short Completed 16 36580 - [- - -] # 2 Background short Completed 16 36556 - [- - -] # 3 Background short Completed 16 36532 - [- - -] # 4 Background short Completed 16 36508 - [- - -] # 5 Background short Completed 16 36484 - [- - -] # 6 Background long Completed 16 36462 - [- - -] # 7 Background short Completed 16 36436 - [- - -] # 8 Background short Completed 16 36412 - [- - -] # 9 Background long Completed 16 36404 - [- - -] #10 Background short Completed 16 36401 - [- - -] #11 Background long Completed 16 2 - [- - -] #12 Background short Completed 16 0 - [- - -] Long (extended) Self Test duration: 6798 seconds [113.3 minutes]

Read the article
solved: passenger(mod_rails) fails to start puppet master under nginx

- by Anadi Misra

On the server [root@bangvmpllDA02 logs]# ruby -v ruby 1.8.7 (2011-06-30 patchlevel 352) [x86_64-linux] [root@bangvmpllDA02 logs]# puppet --version 3.0.1 and [root@bangvmpllDA02 logs]# service nginx configtest nginx: the configuration file /apps/nginx/nginx.conf syntax is ok nginx: configuration file /apps/nginx/nginx.conf test is successful [root@bangvmpllDA02 logs]# service nginx status nginx (pid 25923 25921 25920 25917 25908) is running... [root@bangvmpllDA02 logs]# however none of my agents are able to connect to the master, they all fail with errors like so [amisr1@blramisr195602 ~]$ puppet agent --test --verbose --server bangvmpllda02.XXX.com Info: Creating a new SSL certificate request for blramisr195602.XXX.com Info: Certificate Request fingerprint (SHA256): 26:EB:08:1F:82:32:E4:03:7A:64:8E:30:A3:99:93:26:E6:66:B9:B0:49:B6:08:F9:67:CA:1B:0C:00:B9:1D:41 Error: Could not request certificate: Error 405 on SERVER: <html> <head><title>405 Not Allowed</title></head> <body bgcolor="white"> <center><h1>405 Not Allowed</h1></center> <hr><center>nginx</center> </body> </html> Exiting; failed to retrieve certificate and waitforcert is disabled when I check logs on puppet master [root@bangvmpllDA02 logs]# tail puppet_access.log [05/Dec/2012:17:45:18 +0530] "GET /production/certificate/ca? HTTP/1.1" 404 162 "-" "Ruby" [05/Dec/2012:18:32:23 +0530] "PUT /production/certificate_request/sl63anadi.XXX.com HTTP/1.1" 405 166 "-" "-" [05/Dec/2012:18:33:33 +0530] "GET /production/certificate/sl63anadi.XXX.com? HTTP/1.1" 404 162 "-" "-" [05/Dec/2012:18:33:33 +0530] "GET /production/certificate_request/sl63anadi.XXX.com? HTTP/1.1" 404 162 "-" "-" [05/Dec/2012:18:33:33 +0530] "PUT /production/certificate_request/sl63anadi.XXX.com HTTP/1.1" 405 166 "-" "-" and the error logs show that nginx is not really able to process the request well 2012/12/05 18:33:33 [error] 25920#0: *23 open() "/etc/puppet/rack/public/production/certificate/sl63anadi.XXX.com" failed (2: No such file or directory), client: 10.209.47.26, server: , request: "GET /production/certificate/sl63anadi.XXX.com? HTTP/1.1", host: "bangvmpllda02.XXX.com:8140" 2012/12/05 18:33:33 [error] 25920#0: *24 open() "/etc/puppet/rack/public/production/certificate_request/sl63anadi.XXX.com" failed (2: No such file or directory), client: 10.209.47.26, server: , request: "GET /production/certificate_request/sl63anadi.XXX.com? HTTP/1.1", host: "bangvmpllda02.XXX.com:8140" 2012/12/05 18:47:56 [error] 25923#0: *27 open() "/etc/puppet/rack/public/production/certificate/ca" failed (2: No such file or directory), client: 10.209.47.31, server: , request: "GET /production/certificate/ca? HTTP/1.1", host: "bangvmpllda02.XXX.com:8140" 2012/12/05 18:47:56 [error] 25923#0: *28 open() "/etc/puppet/rack/public/production/certificate_request/blramisr195602.XXX.com" failed (2: No such file or directory), client: 10.209.47.31, server: , request: "GET /production/certificate_request/blramisr195602.XXX.com? HTTP/1.1", host: "bangvmpllda02.XXX.com:8140" Passenger does not show any application groups either [root@bangvmpllDA02 nginx]# passenger-status ----------- General information ----------- max = 15 count = 0 active = 0 inactive = 0 Waiting on global queue: 0 ----------- Application groups ----------- [root@bangvmpllDA02 nginx]# here's my nginx configuration [root@bangvmpllDA02 logs]# cat ../nginx.conf user puppet; worker_processes 4; #error_log logs/error.log; #error_log logs/error.log notice; error_log logs/error.log info; #pid logs/nginx.pid; events { use epoll; worker_connections 1024; } http { include mime.types; default_type application/octet-stream; log_format main '$remote_addr - $remote_user [$time_local] "$request" ' '$status $body_bytes_sent "$http_referer" ' '"$http_user_agent" "$http_x_forwarded_for"'; access_log logs/access.log main; sendfile on; #tcp_nopush on; server_tokens off; #keepalive_timeout 0; keepalive_timeout 120; gzip on; gzip_http_version 1.1; gzip_disable "msie6"; gzip_vary on; gzip_min_length 1100; gzip_buffers 64 8k; gzip_comp_level 3; gzip_proxied any; gzip_types text/plain text/css application/x-javascript text/xml application/xml; server { listen 80; server_name bangvmpllda02.XXXX.com; charset utf-8; #access_log logs/http.access.log main; location / { root html; index index.html index.htm index.php; } #error_page 404 /404.html; # redirect server error pages to the static page /50x.html # error_page 500 502 503 504 /50x.html; location = /50x.html { root html; } # proxy the PHP scripts to Apache listening on 127.0.0.1:80 # #location ~ \.php$ { # proxy_pass http://127.0.0.1; #} # pass the PHP scripts to FastCGI server listening on 127.0.0.1:9000 # location ~ \.php$ { root html; fastcgi_pass unix:/var/run/php-fpm/php-fpm.sock; fastcgi_index index.php; fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name; fastcgi_param SCRIPT_NAME $fastcgi_script_name; include fastcgi_params; } # deny access to .htaccess files, if Apache's document root # concurs with nginx's one # location ~ /\.ht { access_log off; log_not_found off; deny all; } location ~* \.(jpg|jpeg|gif|png|css|js|ico|xml)$ { access_log off; log_not_found off; expires 2d; } } # Passenger needed for puppet passenger_root /usr/lib/ruby/gems/1.8/gems/passenger-3.0.18; passenger_ruby /usr/bin/ruby; passenger_max_pool_size 15; server { ssl on; listen 8140 default ssl; server_name bangvmpllda02.XXXX.com; passenger_enabled on; passenger_set_cgi_param HTTP_X_CLIENT_DN $ssl_client_s_dn; passenger_set_cgi_param HTTP_X_CLIENT_VERIFY $ssl_client_verify; passenger_min_instances 5; access_log logs/puppet_access.log; error_log logs/puppet_error.log; root /etc/puppet/rack/public; ssl_certificate /var/lib/puppet/ssl/certs/bangvmpllda02.XXX.com.pem; ssl_certificate_key /var/lib/puppet/ssl/private_keys/bangvmpllda02.XXX.com.pem; ssl_crl /var/lib/puppet/ssl/ca/ca_crl.pem; ssl_client_certificate /var/lib/puppet/ssl/certs/ca.pem; ssl_ciphers SSLv2:-LOW:-EXPORT:RC4+RSA; ssl_prefer_server_ciphers on; ssl_verify_client optional; ssl_verify_depth 1; ssl_session_cache shared:SSL:128m; ssl_session_timeout 5m; } } and the puppet.conf [main] # The Puppet log directory. # The default value is '$vardir/log'. logdir = /var/log/puppet # Where Puppet PID files are kept. # The default value is '$vardir/run'. rundir = /var/run/puppet dns_alt_names = devops.XXXX.com,devops confdir = /etc/puppet vardir = /var/lib/puppet storeconfigs = true storeconfigs_backend = puppetdb thin_storeconfigs = false async_storeconfigs = false ssl_client_header = SSL_CLIENT_S_D ssl_client_verify_header = SSL_CLIENT_VERIFY # Where SSL certificates are kept. # The default value is '$confdir/ssl'. ssldir = $vardir/ssl any ideas where am I going wrong? I checkthe directory permissions; /usr/share/puppet, /etc/puppet and /var/lib/puppet (and files inside them) are owned by puppet user. Solved The simple solution to my complicated problem was that I had placed the config.ru in wrong place moved it to /etc/puppet/rack , it was in /etc/puppet/rack/public Well!!! :-/

Read the article
Connect ps/2->usb keyboard to linux?

- by Daniel

I have a lovely ancient ergonomic keyboard (no name SK - 6000) connected via a DIN-ps/2 adapter to a ps/2-usb adapter to my docking station. After Grub it stops working. It takes either suspending and waking up or replugging it while Linux is running to get it to work. No extra kernel modules get loaded for this. When it works and I restart without power off, it will work immediately. Even when it does not work, it is visible (lsusb device number varies but output is identical whether working or not): $ lsusb -v -s 001:006 Bus 001 Device 006: ID 0a81:0205 Chesen Electronics Corp. PS/2 Keyboard+Mouse Adapter Device Descriptor: bLength 18 bDescriptorType 1 bcdUSB 1.10 bDeviceClass 0 (Defined at Interface level) bDeviceSubClass 0 bDeviceProtocol 0 bMaxPacketSize0 8 idVendor 0x0a81 Chesen Electronics Corp. idProduct 0x0205 PS/2 Keyboard+Mouse Adapter bcdDevice 0.10 iManufacturer 1 CHESEN iProduct 2 PS2 to USB Converter iSerial 0 bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 59 bNumInterfaces 2 bConfigurationValue 1 iConfiguration 2 PS2 to USB Converter bmAttributes 0xa0 (Bus Powered) Remote Wakeup MaxPower 100mA Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 0 bAlternateSetting 0 bNumEndpoints 1 bInterfaceClass 3 Human Interface Device bInterfaceSubClass 1 Boot Interface Subclass bInterfaceProtocol 1 Keyboard iInterface 0 HID Device Descriptor: bLength 9 bDescriptorType 33 bcdHID 1.10 bCountryCode 0 Not supported bNumDescriptors 1 bDescriptorType 34 Report wDescriptorLength 64 Report Descriptors: ** UNAVAILABLE ** Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x0008 1x 8 bytes bInterval 10 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 1 bAlternateSetting 0 bNumEndpoints 1 bInterfaceClass 3 Human Interface Device bInterfaceSubClass 1 Boot Interface Subclass bInterfaceProtocol 2 Mouse iInterface 0 HID Device Descriptor: bLength 9 bDescriptorType 33 bcdHID 1.10 bCountryCode 0 Not supported bNumDescriptors 1 bDescriptorType 34 Report wDescriptorLength 148 Report Descriptors: ** UNAVAILABLE ** Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x82 EP 2 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x0008 1x 8 bytes bInterval 10 Device Status: 0x0000 (Bus Powered) $ ll -R /sys/bus/hid/drivers/ /sys/bus/hid/drivers/: total 0 drwxr-xr-x 2 root root 0 Jul 8 2012 generic-usb/ /sys/bus/hid/drivers/generic-usb: total 0 lrwxrwxrwx 1 root root 0 Jul 7 23:33 0003:046D:C03D.0003 -> ../../../../devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2.2/1-1.2.2:1.0/0003:046D:C03D.0003/ lrwxrwxrwx 1 root root 0 Jul 7 23:33 0003:0A81:0205.0001 -> ../../../../devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2.1/1-1.2.1:1.0/0003:0A81:0205.0001/ lrwxrwxrwx 1 root root 0 Jul 7 23:33 0003:0A81:0205.0002 -> ../../../../devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2.1/1-1.2.1:1.1/0003:0A81:0205.0002/ --w------- 1 root root 4096 Jul 7 23:32 bind lrwxrwxrwx 1 root root 0 Jul 7 23:33 module -> ../../../../module/usbhid/ --w------- 1 root root 4096 Jul 7 23:32 new_id --w------- 1 root root 4096 Jul 8 2012 uevent --w------- 1 root root 4096 Jul 7 23:32 unbind When replugging, dmesg shows this (which except for the 1st line and different input numbers already came at boot time): [ 1583.295385] usb 1-1.2.1: new low-speed USB device number 6 using ehci_hcd [ 1583.446514] input: CHESEN PS2 to USB Converter as /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2.1/1-1.2.1:1.0/input/input17 [ 1583.446817] generic-usb 0003:0A81:0205.0001: input,hidraw0: USB HID v1.10 Keyboard [CHESEN PS2 to USB Converter] on usb-0000:00:1a.0-1.2.1/input0 [ 1583.454764] input: CHESEN PS2 to USB Converter as /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2.1/1-1.2.1:1.1/input/input18 [ 1583.455534] generic-usb 0003:0A81:0205.0002: input,hidraw1: USB HID v1.10 Mouse [CHESEN PS2 to USB Converter] on usb-0000:00:1a.0-1.2.1/input1 [ 1583.455578] usbcore: registered new interface driver usbhid [ 1583.455584] usbhid: USB HID core driver So I tried $ sudo udevadm test /sys/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2.1/1-1.2.1:1.0/0003:0A81:0205.0001/hidraw/hidraw0 run_command: calling: test adm_test: version 175 This program is for debugging only, it does not run any program, specified by a RUN key. It may show incorrect results, because some values may be different, or not available at a simulation run. parse_file: reading '/lib/udev/rules.d/40-crda.rules' as rules file parse_file: reading '/lib/udev/rules.d/40-fuse.rules' as rules file ... parse_file: reading '/lib/udev/rules.d/40-usb-media-players.rules' as rules file parse_file: reading '/lib/udev/rules.d/40-usb_modeswitch.rules' as rules file ... parse_file: reading '/lib/udev/rules.d/42-qemu-usb.rules' as rules file ... parse_file: reading '/lib/udev/rules.d/69-cd-sensors.rules' as rules file add_rule: IMPORT found builtin 'usb_id', replacing /lib/udev/rules.d/69-cd-sensors.rules:76 ... parse_file: reading '/lib/udev/rules.d/77-mm-usb-device-blacklist.rules' as rules file ... parse_file: reading '/lib/udev/rules.d/85-usbmuxd.rules' as rules file ... parse_file: reading '/lib/udev/rules.d/95-upower-hid.rules' as rules file parse_file: reading '/lib/udev/rules.d/95-upower-wup.rules' as rules file parse_file: reading '/lib/udev/rules.d/97-bluetooth-hid2hci.rules' as rules file udev_rules_new: rules use 271500 bytes tokens (22625 * 12 bytes), 44331 bytes buffer udev_rules_new: temporary index used 76320 bytes (3816 * 20 bytes) udev_device_new_from_syspath: device 0x7f78a5e4d2d0 has devpath '/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2.1/1-1.2.1:1.0/0003:0A81:0205.0001/hidraw/hidraw0' udev_device_new_from_syspath: device 0x7f78a5e5f820 has devpath '/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2.1/1-1.2.1:1.0/0003:0A81:0205.0001/hidraw/hidraw0' udev_device_read_db: device 0x7f78a5e5f820 filled with db file data udev_device_new_from_syspath: device 0x7f78a5e60270 has devpath '/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2.1/1-1.2.1:1.0/0003:0A81:0205.0001' udev_device_new_from_syspath: device 0x7f78a5e609c0 has devpath '/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2.1/1-1.2.1:1.0' udev_device_new_from_syspath: device 0x7f78a5e61160 has devpath '/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2.1' udev_device_new_from_syspath: device 0x7f78a5e61960 has devpath '/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2' udev_device_new_from_syspath: device 0x7f78a5e62150 has devpath '/devices/pci0000:00/0000:00:1a.0/usb1/1-1' udev_device_new_from_syspath: device 0x7f78a5e62940 has devpath '/devices/pci0000:00/0000:00:1a.0/usb1' udev_device_new_from_syspath: device 0x7f78a5e630f0 has devpath '/devices/pci0000:00/0000:00:1a.0' udev_device_new_from_syspath: device 0x7f78a5e638a0 has devpath '/devices/pci0000:00' udev_event_execute_rules: no node name set, will use kernel supplied name 'hidraw0' udev_node_add: creating device node '/dev/hidraw0', devnum=251:0, mode=0600, uid=0, gid=0 udev_node_mknod: preserve file '/dev/hidraw0', because it has correct dev_t udev_node_mknod: preserve permissions /dev/hidraw0, 020600, uid=0, gid=0 node_symlink: preserve already existing symlink '/dev/char/251:0' to '../hidraw0' udev_device_update_db: created empty file '/run/udev/data/c251:0' for '/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2.1/1-1.2.1:1.0/0003:0A81:0205.0001/hidraw/hidraw0' ACTION=add DEVNAME=/dev/hidraw0 DEVPATH=/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2.1/1-1.2.1:1.0/0003:0A81:0205.0001/hidraw/hidraw0 MAJOR=251 MINOR=0 SUBSYSTEM=hidraw UDEV_LOG=6 USEC_INITIALIZED=969079051 The later lines sound like it's already there. And none of these awakes the keyboard: $ sudo udevadm trigger --verbose --sysname-match=usb* /sys/devices/pci0000:00/0000:00:1a.0/usb1 /sys/devices/pci0000:00/0000:00:1a.0/usbmon/usbmon1 /sys/devices/pci0000:00/0000:00:1d.0/usb2 /sys/devices/pci0000:00/0000:00:1d.0/usbmon/usbmon2 /sys/devices/virtual/usbmon/usbmon0 $ sudo udevadm trigger --verbose --sysname-match=hidraw0 /sys/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2.1/1-1.2.1:1.0/0003:0A81:0205.0001/hidraw/hidraw0 $ sudo udevadm trigger I also tried this to no avail: # echo -n 0003:0A81:0205.0001 > /sys/bus/hid/drivers/generic-usb/bind ksh: echo: write to 1 failed [No such device] # echo -n 0003:0A81:0205.0001 > /sys/bus/hid/drivers/generic-usb/unbind # echo -n 0003:0A81:0205.0001 > /sys/bus/hid/drivers/generic-usb/bind # echo usb1 >/sys/bus/usb/drivers/usb/unbind # echo usb1 >/sys/bus/usb/drivers/usb/bind What else should I try to get the same result as replugging or suspending, by just issuing a command?

Read the article
Why are my hard drives failing?

- by WishCow

I have a small Ubuntu server running at home, with 2 HDDs. There are two software raids (raid1) on the disks, managed by mdadm, which I believe is irrelevant, but mentioning it anyway. Both of the HDDs are Western Digital, and have been used for around 2 years, when one of them started making clicking noises, and died. I figured that maybe it's natural after 2 years, so I bought a new one, and resynced the raid arrays. After about a month, the other drive also died. I didn't get suspicious, since both drives have been bought at the same time, it's not that surprising to see both of them near each other, so I bought another one. So far, 2 old drives failed, and 2 brand new in the system. After one month, one of the new drives died. This is when it started getting suspicious. Since the PC was put together from some really old parts (think AthlonXP), I figured that maybe the motherboard's SATA controller is the culprit. Of course you can't switch parts easily in an old PC like this, so I bought a whole system, new MB, new CPU, new RAM. Took the just failed drive back, since it was under warranty, and got it replaced. So it is up to 2 failed drives from the old ones, and 1 failed drive from the new ones. No problems, for 1 month. After that errors were creeping up again in /var/log/messages, and mdadm was reporting raid array failures. I started tearing my hair out. Everything is new in the system, it's up to the third brand new HDD, it's simply not possible that all of the new drives that I bought were faulty. Let's see what is still common... the cables. Okay, long shot, let's replace the SATA cables. Take HDD back, smile to the guy at the counter and say that I'm really unlucky. He replaces the HDD. I come home, one month passes and one of HDDs fails, again. I'm not joking. Two of the brand new HDDs have failed. Maybe it's a bug in the OS. Let's see what the manufacturer's testing tool says. Download testing tool, burn it to a CD, reboot, leave HDD testing overnight. Test says that the drive is faulty, and I should back up everything, if I still can. I don't know what's happening, but it does not look like a software problem, something is definitely thrashing the HDDs. I should mention now, that the whole system is in a shoebox. Since there are a load of "build your own ikea case" stuff, I thought there shouldn't be any problems throwing the thing in a box, and stuffing it away somewhere. The box is well ventilated, but I thought that just maybe the drives were overheating. There is no other possible answer to this. So I took the HDD back, and got it replaced (for the 3rd time), and bought HDD coolers. And just now, I have heard the sound of doom. click click whizzzzzzzzz. SSH into the box: You have new mail! mail r 1 DegradedArrayEvent on /dev/md0 ... dmesg output: [47128.000051] ata3: lost interrupt (Status 0x50) [47128.000097] end_request: I/O error, dev sda, sector 58588863 [47128.000134] md: super_written gets error=-5, uptodate=0 [48043.976054] ata3: lost interrupt (Status 0x50) [48043.976086] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [48043.976132] ata3.00: cmd c8/00:18:bf:40:52/00:00:00:00:00/e1 tag 0 dma 12288 in [48043.976135] res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [48043.976208] ata3.00: status: { DRDY } [48043.976241] ata3: soft resetting link [48044.148446] ata3.00: configured for UDMA/133 [48044.148457] ata3.00: device reported invalid CHS sector 0 [48044.148477] ata3: EH complete Recap: No possibility of overheating 6 drives have failed, 4 of those have been brand new. I'm not sure now that the original two have been faulty, or suffered the same thing that the new ones. There is nothing common in the system, apart from the OS which is Ubuntu Karmic now (started with Jaunty). New MB, new CPU, new RAM, new SATA cables. No, the little holes on the HDD are not covered I'm crying. Really. I don't have the face to return to the store now, it's not possible for 4 drives to fail under 4 months. A few ideas that I have been thinking: Is it possible that I fuck up something when I partition and resync the drives? Can it be so bad that it physicaly wrecks the drive? (since the vendor supplied tool says that the drive is damaged) I do the partitoning with fdisk, and use the same block size for the raid1 partitions (I check the exact block sizes with fdisk -lu) Is it possible that the linux kernel or mdadm, or something is not compatible with this exact brand of HDDs, and thrashes them? Is it possible that it may be the shoebox? Try placing it somewhere else? It's under a shelf now, so humidity is not a problem either. Is it possible that a normal PC case will solve my problem (I'm going to shoot myself then)? I will get a picture tomorrow. Am I just simply cursed? Any help or speculation is greatly appreciated. Edit: The power strip is guarded against overvoltage. Edit2: I have moved inbetween these 4 months, so the possibility of the cause being "dirty" electricity in both places, is very low. Edit3: I have checked the voltages in the BIOS (couldn't borrow a multimeter), and they are all seem correct, the biggest discrepancy is in the 12V, because it's supplying 11.3. Should I be worried about that? Edit4: I put my desktop PC's PSU into the server. The BIOS reported much more accurate voltage readings, and also it has successfully rebuilt the raid1 array, which took some 3-4 hours, so I feel a little positive now. Will get a new PSU tomorrow to test with that. Also, attaching the picture about the box: (disregard the 3rd drive)

Read the article
How to diagnose failing 6Gbps SATA connection?

- by whitequark

I have a Samsung RC530 notebook and OCZ Vertex-3 6Gbps SATA SSD working in AHCI mode. # dmesg | grep DMI SAMSUNG ELECTRONICS CO., LTD. RC530/RC730/RC530/RC730, BIOS 03WD.M008.20110927.PSA 09/27/2011 # lspci -nn 00:1f.2 SATA controller [0106]: Intel Corporation 6 Series/C200 Series Chipset Family 6 port SATA AHCI Controller [8086:1c03] (rev 04) # sdparm -a /dev/sda /dev/sda: ATA OCZ-VERTEX3 2.15 At the boot, the following messages are present in dmesg (I am running Debian wheezy @ Linux 3.2.8): # dmesg | grep -iE '(ata|ahci)' [ 5.179783] ahci 0000:00:1f.2: version 3.0 [ 5.179802] ahci 0000:00:1f.2: PCI INT B -> GSI 19 (level, low) -> IRQ 19 [ 5.179864] ahci 0000:00:1f.2: irq 42 for MSI/MSI-X [ 5.195424] ahci 0000:00:1f.2: AHCI 0001.0300 32 slots 6 ports 6 Gbps 0x5 impl SATA mode [ 5.195429] ahci 0000:00:1f.2: flags: 64bit ncq sntf pm led clo pio slum part ems apst [ 5.195436] ahci 0000:00:1f.2: setting latency timer to 64 [ 5.204035] scsi0 : ahci [ 5.204301] scsi1 : ahci [ 5.204447] scsi2 : ahci [ 5.204592] scsi3 : ahci [ 5.204682] scsi4 : ahci [ 5.204799] scsi5 : ahci [ 5.204917] ata1: SATA max UDMA/133 abar m2048@0xf7c06000 port 0xf7c06100 irq 42 [ 5.204920] ata2: DUMMY [ 5.204923] ata3: SATA max UDMA/133 abar m2048@0xf7c06000 port 0xf7c06200 irq 42 [ 5.204924] ata4: DUMMY [ 5.204926] ata5: DUMMY [ 5.204927] ata6: DUMMY [ 5.523039] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300) [ 5.525911] ata3.00: ATAPI: TSSTcorp CDDVDW SN-208BB, SC00, max UDMA/100 [ 5.531006] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 5.533703] ata3.00: configured for UDMA/100 [ 5.542790] ata1.00: ATA-8: OCZ-VERTEX3, 2.15, max UDMA/133 [ 5.542800] ata1.00: 117231408 sectors, multi 16: LBA48 NCQ (depth 31/32), AA [ 5.552751] ata1.00: configured for UDMA/133 [ 5.553050] scsi 0:0:0:0: Direct-Access ATA OCZ-VERTEX3 2.15 PQ: 0 ANSI: 5 [ 5.559621] scsi 2:0:0:0: CD-ROM TSSTcorp CDDVDW SN-208BB SC00 PQ: 0 ANSI: 5 [ 5.564059] sd 0:0:0:0: [sda] 117231408 512-byte logical blocks: (60.0 GB/55.8 GiB) [ 5.564127] sd 0:0:0:0: [sda] Write Protect is off [ 5.564131] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 [ 5.564158] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 5.564582] sda: sda1 [ 5.564810] sd 0:0:0:0: [sda] Attached SCSI disk [ 5.572006] sr0: scsi3-mmc drive: 16x/24x writer dvd-ram cd/rw xa/form2 cdda tray [ 5.572010] cdrom: Uniform CD-ROM driver Revision: 3.20 [ 5.572189] sr 2:0:0:0: Attached scsi CD-ROM sr0 [ 6.717181] ata1.00: exception Emask 0x50 SAct 0x1 SErr 0x280900 action 0x6 frozen [ 6.717238] ata1.00: irq_stat 0x08000000, interface fatal error [ 6.717291] ata1: SError: { UnrecovData HostInt 10B8B BadCRC } [ 6.717342] ata1.00: failed command: READ FPDMA QUEUED [ 6.717395] ata1.00: cmd 60/50:00:20:39:58/00:00:00:00:00/40 tag 0 ncq 40960 in [ 6.717396] res 40/00:00:20:39:58/00:00:00:00:00/40 Emask 0x50 (ATA bus error) [ 6.717503] ata1.00: status: { DRDY } [ 6.717553] ata1: hard resetting link [ 7.033417] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 7.055234] ata1.00: configured for UDMA/133 [ 7.055262] ata1: EH complete [ 7.147280] ata1.00: exception Emask 0x10 SAct 0xf8 SErr 0x280100 action 0x6 frozen [ 7.147340] ata1.00: irq_stat 0x08000000, interface fatal error [ 7.147393] ata1: SError: { UnrecovData 10B8B BadCRC } [ 7.147460] ata1.00: failed command: READ FPDMA QUEUED [ 7.147529] ata1.00: cmd 60/08:18:88:17:41/00:00:02:00:00/40 tag 3 ncq 4096 in [ 7.147531] res 40/00:38:50:99:64/00:00:02:00:00/40 Emask 0x10 (ATA bus error) [ 7.147691] ata1.00: status: { DRDY } [ 7.147754] ata1.00: failed command: READ FPDMA QUEUED [ 7.147821] ata1.00: cmd 60/00:20:f8:42:4c/01:00:02:00:00/40 tag 4 ncq 131072 in [ 7.147822] res 40/00:38:50:99:64/00:00:02:00:00/40 Emask 0x10 (ATA bus error) [ 7.147977] ata1.00: status: { DRDY } [ 7.148036] ata1.00: failed command: READ FPDMA QUEUED [ 7.148100] ata1.00: cmd 60/50:28:f8:43:4c/00:00:02:00:00/40 tag 5 ncq 40960 in [ 7.148101] res 40/00:38:50:99:64/00:00:02:00:00/40 Emask 0x10 (ATA bus error) [ 7.148255] ata1.00: status: { DRDY } [ 7.148315] ata1.00: failed command: READ FPDMA QUEUED [ 7.148379] ata1.00: cmd 60/00:30:50:98:64/01:00:02:00:00/40 tag 6 ncq 131072 in [ 7.148380] res 40/00:38:50:99:64/00:00:02:00:00/40 Emask 0x10 (ATA bus error) [ 7.148534] ata1.00: status: { DRDY } [ 7.148593] ata1.00: failed command: READ FPDMA QUEUED [ 7.148657] ata1.00: cmd 60/00:38:50:99:64/01:00:02:00:00/40 tag 7 ncq 131072 in [ 7.148658] res 40/00:38:50:99:64/00:00:02:00:00/40 Emask 0x10 (ATA bus error) [ 7.148813] ata1.00: status: { DRDY } [ 7.148875] ata1: hard resetting link [ 7.464842] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 7.486794] ata1.00: configured for UDMA/133 [ 7.486822] ata1: EH complete [ 7.546395] ata1.00: exception Emask 0x10 SAct 0x2f SErr 0x280100 action 0x6 frozen [ 7.546470] ata1.00: irq_stat 0x08000000, interface fatal error [ 7.546531] ata1: SError: { UnrecovData 10B8B BadCRC } [ 7.546588] ata1.00: failed command: READ FPDMA QUEUED [ 7.546648] ata1.00: cmd 60/00:00:e0:4b:61/01:00:02:00:00/40 tag 0 ncq 131072 in [ 7.546649] res 40/00:28:e0:4c:61/00:00:02:00:00/40 Emask 0x10 (ATA bus error) [ 7.546794] ata1.00: status: { DRDY } [ 7.546847] ata1.00: failed command: READ FPDMA QUEUED [ 7.546906] ata1.00: cmd 60/00:08:90:2f:48/01:00:02:00:00/40 tag 1 ncq 131072 in [ 7.546907] res 40/00:28:e0:4c:61/00:00:02:00:00/40 Emask 0x10 (ATA bus error) [ 7.547053] ata1.00: status: { DRDY } [ 7.547106] ata1.00: failed command: READ FPDMA QUEUED [ 7.547165] ata1.00: cmd 60/00:10:90:30:48/01:00:02:00:00/40 tag 2 ncq 131072 in [ 7.547166] res 40/00:28:e0:4c:61/00:00:02:00:00/40 Emask 0x10 (ATA bus error) [ 7.547310] ata1.00: status: { DRDY } [ 7.547363] ata1.00: failed command: READ FPDMA QUEUED [ 7.547422] ata1.00: cmd 60/00:18:50:c7:64/01:00:02:00:00/40 tag 3 ncq 131072 in [ 7.547423] res 40/00:28:e0:4c:61/00:00:02:00:00/40 Emask 0x10 (ATA bus error) [ 7.547568] ata1.00: status: { DRDY } [ 7.547621] ata1.00: failed command: READ FPDMA QUEUED [ 7.547681] ata1.00: cmd 60/00:28:e0:4c:61/01:00:02:00:00/40 tag 5 ncq 131072 in [ 7.547682] res 40/00:28:e0:4c:61/00:00:02:00:00/40 Emask 0x10 (ATA bus error) [ 7.547825] ata1.00: status: { DRDY } [ 7.547882] ata1: hard resetting link [ 7.864408] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 7.886351] ata1.00: configured for UDMA/133 [ 7.886375] ata1: EH complete [ 7.890012] ata1: limiting SATA link speed to 3.0 Gbps [ 7.890016] ata1.00: exception Emask 0x10 SAct 0x7 SErr 0x280100 action 0x6 frozen [ 7.890093] ata1.00: irq_stat 0x08000000, interface fatal error [ 7.890152] ata1: SError: { UnrecovData 10B8B BadCRC } [ 7.890210] ata1.00: failed command: READ FPDMA QUEUED [ 7.890272] ata1.00: cmd 60/00:00:90:33:48/01:00:02:00:00/40 tag 0 ncq 131072 in [ 7.890273] res 40/00:10:e0:4f:61/00:00:02:00:00/40 Emask 0x10 (ATA bus error) [ 7.890418] ata1.00: status: { DRDY } [ 7.890472] ata1.00: failed command: READ FPDMA QUEUED [ 7.890530] ata1.00: cmd 60/00:08:90:34:48/01:00:02:00:00/40 tag 1 ncq 131072 in [ 7.890531] res 40/00:10:e0:4f:61/00:00:02:00:00/40 Emask 0x10 (ATA bus error) [ 7.890672] ata1.00: status: { DRDY } [ 7.890724] ata1.00: failed command: READ FPDMA QUEUED [ 7.890781] ata1.00: cmd 60/78:10:e0:4f:61/00:00:02:00:00/40 tag 2 ncq 61440 in [ 7.890782] res 40/00:10:e0:4f:61/00:00:02:00:00/40 Emask 0x10 (ATA bus error) [ 7.890925] ata1.00: status: { DRDY } [ 7.890981] ata1: hard resetting link [ 8.208021] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 320) [ 8.230100] ata1.00: configured for UDMA/133 [ 8.230124] ata1: EH complete Looks like the SATA interface tries to use 6Gbps link, then fails miserably and Linux fallbacks to 3Gbps. This is somewhat fine for me, as the system boots successfully each time and works under high load (cd linux-3.2.8; make -j16). I've also ran memtest86+ and it did not find any errors. What concerns me more is that Grub sometimes takes a long time to load the images and/or fails to load itself completely. The error is consistent and is probablistic: that is, each time I boot I have a certain chance to fail. Actually, I have a slight suspiction on the cause of the failure. Look at the cabling: What kind of engineer does it this way? Nah. Even 1Gbps Ethernet hardly tolerates cables bent over a small angle, and there you have 6Gbps SATA. How cound I determine and fix the cause of errors and/or switch the link to 3Gbps mode permanently?

Read the article
Windows Server 2008 R2 network adapter stops working, requires hard reboot

- by Geoff Dalgas

TL;DR version: Turns out this was a Windows Server 2008 R2 kernel networking bug. After siccing Microsoft support on it, we (eventually) got an unpublished kernel hotfix from Microsoft to address it. If you, too, are experiencing mysterious low-level network driver failures requiring a reboot/bluescreen cycle, you might want that hotfix (or maybe Service Pack 1 whenever it is released, too.) We have been using HAProxy along with heartbeat from the Linux-HA project. We are using two linux instances to provide a failover. Each server has with their own public IP and a single IP which is shared between the two using a virtual interface (eth1:1) at IP: 69.59.196.211 The virtual interface (eth1:1) IP 69.59.196.211 is configured as the gateway for the windows servers behind them and we use ip_forwarding to route traffic. We are experiencing an occasional network outage on one of our windows servers behind our linux gateways. HAProxy will detect the server is offline which we can verify by remoting to the failed server and attempting to ping the gateway: Pinging 69.59.196.211 with 32 bytes of data: Reply from 69.59.196.220: Destination host unreachable. Running arp -a on this failed server shows that there is no entry for the gateway address (69.59.196.211): Interface: 69.59.196.220 --- 0xa Internet Address Physical Address Type 69.59.196.161 00-26-88-63-c7-80 dynamic 69.59.196.210 00-15-5d-0a-3e-0e dynamic 69.59.196.212 00-21-5e-4d-45-c9 dynamic 69.59.196.213 00-15-5d-00-b2-0d dynamic 69.59.196.215 00-21-5e-4d-61-1a dynamic 69.59.196.217 00-21-5e-4d-2c-e8 dynamic 69.59.196.219 00-21-5e-4d-38-e5 dynamic 69.59.196.221 00-15-5d-00-b2-0d dynamic 69.59.196.222 00-15-5d-0a-3e-09 dynamic 69.59.196.223 ff-ff-ff-ff-ff-ff static 224.0.0.22 01-00-5e-00-00-16 static 224.0.0.252 01-00-5e-00-00-fc static 225.0.0.1 01-00-5e-00-00-01 static On our linux gateway instances arp -a shows: peak-colo-196-220.peak.org (69.59.196.220) at <incomplete> on eth1 stackoverflow.com (69.59.196.212) at 00:21:5e:4d:45:c9 [ether] on eth1 peak-colo-196-215.peak.org (69.59.196.215) at 00:21:5e:4d:61:1a [ether] on eth1 peak-colo-196-219.peak.org (69.59.196.219) at 00:21:5e:4d:38:e5 [ether] on eth1 peak-colo-196-222.peak.org (69.59.196.222) at 00:15:5d:0a:3e:09 [ether] on eth1 peak-colo-196-209.peak.org (69.59.196.209) at 00:26:88:63:c7:80 [ether] on eth1 peak-colo-196-217.peak.org (69.59.196.217) at 00:21:5e:4d:2c:e8 [ether] on eth1 Why would arp occasionally set the entry for this failed server as <incomplete>? Should we be defining our arp entries statically? I've always left arp alone since it works 99% of the time, but in this one instance it appears to be failing. Are there any additional troubleshooting steps we can take help resolve this issue? THINGS WE HAVE TRIED I added a static arp entry for testing on one of the linux gateways which still didn't help. root@haproxy2:~# arp -a peak-colo-196-215.peak.org (69.59.196.215) at 00:21:5e:4d:61:1a [ether] on eth1 peak-colo-196-221.peak.org (69.59.196.221) at 00:15:5d:00:b2:0d [ether] on eth1 stackoverflow.com (69.59.196.212) at 00:21:5e:4d:45:c9 [ether] on eth1 peak-colo-196-219.peak.org (69.59.196.219) at 00:21:5e:4d:38:e5 [ether] on eth1 peak-colo-196-209.peak.org (69.59.196.209) at 00:26:88:63:c7:80 [ether] on eth1 peak-colo-196-217.peak.org (69.59.196.217) at 00:21:5e:4d:2c:e8 [ether] on eth1 peak-colo-196-220.peak.org (69.59.196.220) at 00:21:5e:4d:30:8d [ether] PERM on eth1 root@haproxy2:~# arp -i eth1 -s 69.59.196.220 00:21:5e:4d:30:8d root@haproxy2:~# ping 69.59.196.220 PING 69.59.196.220 (69.59.196.220) 56(84) bytes of data. --- 69.59.196.220 ping statistics --- 7 packets transmitted, 0 received, 100% packet loss, time 6006ms Rebooting the windows web server solves this issue temporarily with no other changes to the network but our experience shows this issue will come back. Swapping network cards and switches I noticed the link light on the port of the switch for the failed windows server was running at 100Mb instead of 1Gb on the failed interface. I moved the cable to several other open ports and the link indicated 100Mb for each port that I tried. I also swapped the cable with the same result. I tried changing the properties of the network card in windows and the server locked up and required a hard reset after clicking apply. This windows server has two physical network interfaces so I have swapped the cables and network settings on the two interfaces to see if the problem follows the interface. If the public interface goes down again we will know that it is not an issue with the network card. (We also tried another switch we have on hand, no change) Changing network hardware driver versions We've had the same problem with the latest Broadcom driver, as well as the built-in driver that ships in Windows Server 2008 R2. Replacing network cables As a last ditch effort we remembered another change that occurred was the replacement of all of the patch cords between our servers / switch. We had purchased two sets, one green of lengths 1ft - 3ft for the private interfaces and another set of red cables for the public interfaces. We swapped out all of the public interface patch cables with a different brand and ran our servers without issue for a full week ... aaaaaand then the problem recurred. Disable checksum offload, remove TProxy We also tried disabling TCP/IP checksum offload in the driver, no change. We're now pulling out TProxy and moving to a more traditional x-forwarded-for network arrangement without any fancy IP address rewriting. We'll see if that helps. Switch Virtualization providers On the off chance this was related to Hyper-V in some way (we do host Linux VMs on it), we switched to VMWare Server. No change. Switch host model We've reached the end of our troubleshooting rope and are now formally involving Microsoft support. They recommended changing the host model: http://en.wikipedia.org/wiki/Host_model http://technet.microsoft.com/en-us/magazine/2007.09.cableguy.aspx We did that, and.. we'll see.

Read the article
Linux server is only using 60% of memory, then swapping

- by Kamil Kisiel

I've got a Linux server that's running our bacula backup system. The machine is grinding like mad because it's going heavy in to swap. The problem is, it's only using 60% of its physical memory! Here's the output from free -m: free -m total used free shared buffers cached Mem: 3949 2356 1593 0 0 1 -/+ buffers/cache: 2354 1595 Swap: 7629 1804 5824 and some sample output from vmstat 1: procs -----------memory---------- ---swap-- -----io---- -system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 0 2 1843536 1634512 0 4188 54 13 2524 666 2 1 1 1 89 9 0 1 11 1845916 1640724 0 388 2700 4816 221880 4879 14409 170721 4 3 63 30 0 0 9 1846096 1643952 0 0 4956 756 174832 804 12357 159306 3 4 63 30 0 0 11 1846104 1643532 0 0 4916 540 174320 580 10609 139960 3 4 64 29 0 0 4 1846084 1640272 0 2336 4080 524 140408 548 9331 118287 3 4 63 30 0 0 8 1846104 1642096 0 1488 2940 432 102516 457 7023 82230 2 4 65 29 0 0 5 1846104 1642268 0 1276 3704 452 126520 452 9494 119612 3 5 65 27 0 3 12 1846104 1641528 0 328 6092 608 187776 636 8269 113059 4 3 64 29 0 2 2 1846084 1640960 0 724 5948 0 111480 0 7751 116370 4 4 63 29 0 0 4 1846100 1641484 0 404 4144 1476 125760 1500 10668 105358 2 3 71 25 0 0 13 1846104 1641932 0 0 5872 828 153808 840 10518 128447 3 4 70 22 0 0 8 1846096 1639172 0 3164 3556 556 74884 580 5082 65362 2 2 73 23 0 1 4 1846080 1638676 0 396 4512 28 50928 44 2672 38277 2 2 80 16 0 0 3 1846080 1628808 0 7132 2636 0 28004 8 1358 14090 0 1 78 20 0 0 2 1844728 1618552 0 11140 7680 0 12740 8 763 2245 0 0 82 18 0 0 2 1837764 1532056 0 101504 2952 0 95644 24 802 3817 0 1 87 12 0 0 11 1842092 1633324 0 4416 1748 10900 143144 11024 6279 134442 3 3 70 24 0 2 6 1846104 1642756 0 0 4768 468 78752 468 4672 60141 2 2 76 20 0 1 12 1846104 1640792 0 236 4752 440 140712 464 7614 99593 3 5 58 34 0 0 3 1846084 1630368 0 6316 5104 0 20336 0 1703 22424 1 1 72 26 0 2 17 1846104 1638332 0 3168 4080 1720 211960 1744 11977 155886 3 4 65 28 0 1 10 1846104 1640800 0 132 4488 556 126016 584 8016 106368 3 4 63 29 0 0 14 1846104 1639740 0 2248 3436 428 114188 452 7030 92418 3 3 59 35 0 1 6 1846096 1639504 0 1932 5500 436 141412 460 8261 112210 4 4 63 29 0 0 10 1846104 1640164 0 3052 4028 448 147684 472 7366 109554 4 4 61 30 0 0 10 1846100 1641040 0 2332 4952 632 147452 664 8767 118384 3 4 63 30 0 4 8 1846084 1641092 0 664 4948 276 152264 292 6448 98813 5 5 62 28 0 Furthermore, the output of top sorted by CPU time seems to support the theory that swap is what's bogging down the system: top - 09:05:32 up 37 days, 23:24, 1 user, load average: 9.75, 8.24, 7.12 Tasks: 173 total, 1 running, 172 sleeping, 0 stopped, 0 zombie Cpu(s): 1.6%us, 1.4%sy, 0.0%ni, 76.1%id, 20.6%wa, 0.1%hi, 0.2%si, 0.0%st Mem: 4044632k total, 2405628k used, 1639004k free, 0k buffers Swap: 7812492k total, 1851852k used, 5960640k free, 436k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ TIME COMMAND 4174 root 17 0 63156 176 56 S 8 0.0 2138:52 35,38 bacula-fd 4185 root 17 0 63352 284 104 S 6 0.0 1709:25 28,29 bacula-sd 240 root 15 0 0 0 0 D 3 0.0 831:55.19 831:55 kswapd0 2852 root 10 -5 0 0 0 S 1 0.0 126:35.59 126:35 xfsbufd 2849 root 10 -5 0 0 0 S 0 0.0 119:50.94 119:50 xfsbufd 1364 root 10 -5 0 0 0 S 0 0.0 117:05.39 117:05 xfsbufd 21 root 10 -5 0 0 0 S 1 0.0 48:03.44 48:03 events/3 6940 postgres 16 0 43596 8 8 S 0 0.0 46:50.35 46:50 postmaster 1342 root 10 -5 0 0 0 S 0 0.0 23:14.34 23:14 xfsdatad/4 5415 root 17 0 1770m 108 48 S 0 0.0 15:03.74 15:03 bacula-dir 23 root 10 -5 0 0 0 S 0 0.0 13:09.71 13:09 events/5 5604 root 17 0 1216m 500 200 S 0 0.0 12:38.20 12:38 java 5552 root 16 0 1194m 580 248 S 0 0.0 11:58.00 11:58 java Here's the same sorted by virtual memory image size: top - 09:08:32 up 37 days, 23:27, 1 user, load average: 8.43, 8.26, 7.32 Tasks: 173 total, 1 running, 172 sleeping, 0 stopped, 0 zombie Cpu(s): 3.6%us, 3.4%sy, 0.0%ni, 62.2%id, 30.2%wa, 0.2%hi, 0.3%si, 0.0%st Mem: 4044632k total, 2404212k used, 1640420k free, 0k buffers Swap: 7812492k total, 1852548k used, 5959944k free, 100k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ TIME COMMAND 5415 root 17 0 1770m 56 44 S 0 0.0 15:03.78 15:03 bacula-dir 5604 root 17 0 1216m 492 200 S 0 0.0 12:38.30 12:38 java 5552 root 16 0 1194m 476 200 S 0 0.0 11:58.20 11:58 java 4598 root 16 0 117m 44 44 S 0 0.0 0:13.37 0:13 eventmond 9614 gdm 16 0 93188 0 0 S 0 0.0 0:00.30 0:00 gdmgreeter 5527 root 17 0 78716 0 0 S 0 0.0 0:00.30 0:00 gdm 4185 root 17 0 63352 284 104 S 20 0.0 1709:52 28,29 bacula-sd 4174 root 17 0 63156 208 88 S 24 0.0 2139:25 35,39 bacula-fd 10849 postgres 18 0 54740 216 108 D 0 0.0 0:31.40 0:31 postmaster 6661 postgres 17 0 49432 0 0 S 0 0.0 0:03.50 0:03 postmaster 5507 root 15 0 47980 0 0 S 0 0.0 0:00.00 0:00 gdm 6940 postgres 16 0 43596 16 16 S 0 0.0 46:51.39 46:51 postmaster 5304 postgres 16 0 40580 132 88 S 0 0.0 6:21.79 6:21 postmaster 5301 postgres 17 0 40448 24 24 S 0 0.0 0:32.17 0:32 postmaster 11280 root 16 0 40288 28 28 S 0 0.0 0:00.11 0:00 sshd 5534 root 17 0 37580 0 0 S 0 0.0 0:56.18 0:56 X 30870 root 30 15 31668 28 28 S 0 0.0 1:13.38 1:13 snmpd 5305 postgres 17 0 30628 16 16 S 0 0.0 0:11.60 0:11 postmaster 27403 postfix 17 0 30248 0 0 S 0 0.0 0:02.76 0:02 qmgr 10815 postfix 15 0 30208 16 16 S 0 0.0 0:00.02 0:00 pickup 5306 postgres 16 0 29760 20 20 S 0 0.0 0:52.89 0:52 postmaster 5302 postgres 17 0 29628 64 32 S 0 0.0 1:00.64 1:00 postmaster I've tried tuning the swappiness kernel parameter to both high and low values, but nothing appears to change the behavior here. I'm at a loss to figure out what's going on. How can I find out what's causing this? Update: The system is a fully 64-bit system, so there should be no question of memory limitations due to 32-bit issues. Update2: As I mentioned in the original question, I've already tried tuning swappiness to all sorts of values, including 0. The result is always the same, with approximately 1.6 GB of memory remaining unused. Update3: Added top output to the above info.

Read the article
Why is Java EE 6 better than Spring ?

- by arungupta

Java EE 6 was released over 2 years ago and now there are 14 compliant application servers. In all my talks around the world, a question that is frequently asked is Why should I use Java EE 6 instead of Spring ? There are already several blogs covering that topic: Java EE wins over Spring by Bill Burke Why will I use Java EE instead of Spring in new Enterprise Java projects in 2012 ? by Kai Waehner (more discussion on TSS) Spring to Java EE migration (Part 1 and 2, 3 and 4 coming as well) by David Heffelfinger Spring to Java EE - A Migration Experience by Lincoln Baxter Migrating Spring to Java EE 6 by Bert Ertman and Paul Bakker at NLJUG Moving from Spring to Java EE 6 - The Age of Frameworks is Over at TSS Java EE vs Spring Shootout by Rohit Kelapure and Reza Rehman at JavaOne 2011 Java EE 6 and the Ewoks by Murat Yener Definite excuse to avoid Spring forever - Bert Ertman and Arun Gupta I will try to share my perspective in this blog. First of all, I'd like to start with a note: Thank you Spring framework for filling the interim gap and providing functionality that is now included in the mainstream Java EE 6 application servers. The Java EE platform has evolved over the years learning from frameworks like Spring and provides all the functionality to build an enterprise application. Thank you very much Spring framework! While Spring was revolutionary in its time and is still very popular and quite main stream in the same way Struts was circa 2003, it really is last generation's framework - some people are even calling it legacy. However my theory is "code is king". So my approach is to build/take a simple Hello World CRUD application in Java EE 6 and Spring and compare the deployable artifacts. I started looking at the official tutorial Developing a Spring Framework MVC Application Step-by-Step but it is using the older version 2.5. I wasn't able to find any updated version in the current 3.1 release. Next, I downloaded Spring Tool Suite and thought that would provide some template samples to get started. A least a quick search did not show any handy tutorials - either video or text-based. So I searched and found a link to their SVN repository at src.springframework.org/svn/spring-samples/. I tried the "mvc-basic" sample and the generated WAR file was 4.43 MB. While it was named a "basic" sample it seemed to come with 19 different libraries bundled but it was what I could find: ./WEB-INF/lib/aopalliance-1.0.jar./WEB-INF/lib/hibernate-validator-4.1.0.Final.jar./WEB-INF/lib/jcl-over-slf4j-1.6.1.jar./WEB-INF/lib/joda-time-1.6.2.jar./WEB-INF/lib/joda-time-jsptags-1.0.2.jar./WEB-INF/lib/jstl-1.2.jar./WEB-INF/lib/log4j-1.2.16.jar./WEB-INF/lib/slf4j-api-1.6.1.jar./WEB-INF/lib/slf4j-log4j12-1.6.1.jar./WEB-INF/lib/spring-aop-3.0.5.RELEASE.jar./WEB-INF/lib/spring-asm-3.0.5.RELEASE.jar./WEB-INF/lib/spring-beans-3.0.5.RELEASE.jar./WEB-INF/lib/spring-context-3.0.5.RELEASE.jar./WEB-INF/lib/spring-context-support-3.0.5.RELEASE.jar./WEB-INF/lib/spring-core-3.0.5.RELEASE.jar./WEB-INF/lib/spring-expression-3.0.5.RELEASE.jar./WEB-INF/lib/spring-web-3.0.5.RELEASE.jar./WEB-INF/lib/spring-webmvc-3.0.5.RELEASE.jar./WEB-INF/lib/validation-api-1.0.0.GA.jar And it is not even using any database! The app deployed fine on GlassFish 3.1.2 but the "@Controller Example" link did not work as it was missing the context root. With a bit of tweaking I could deploy the application and assume that the account got created because no error was displayed in the browser or server log. Next I generated the WAR for "mvc-ajax" and the 5.1 MB WAR had 20 JARs (1 removed, 2 added): ./WEB-INF/lib/aopalliance-1.0.jar./WEB-INF/lib/hibernate-validator-4.1.0.Final.jar./WEB-INF/lib/jackson-core-asl-1.6.4.jar./WEB-INF/lib/jackson-mapper-asl-1.6.4.jar./WEB-INF/lib/jcl-over-slf4j-1.6.1.jar./WEB-INF/lib/joda-time-1.6.2.jar./WEB-INF/lib/jstl-1.2.jar./WEB-INF/lib/log4j-1.2.16.jar./WEB-INF/lib/slf4j-api-1.6.1.jar./WEB-INF/lib/slf4j-log4j12-1.6.1.jar./WEB-INF/lib/spring-aop-3.0.5.RELEASE.jar./WEB-INF/lib/spring-asm-3.0.5.RELEASE.jar./WEB-INF/lib/spring-beans-3.0.5.RELEASE.jar./WEB-INF/lib/spring-context-3.0.5.RELEASE.jar./WEB-INF/lib/spring-context-support-3.0.5.RELEASE.jar./WEB-INF/lib/spring-core-3.0.5.RELEASE.jar./WEB-INF/lib/spring-expression-3.0.5.RELEASE.jar./WEB-INF/lib/spring-web-3.0.5.RELEASE.jar./WEB-INF/lib/spring-webmvc-3.0.5.RELEASE.jar./WEB-INF/lib/validation-api-1.0.0.GA.jar 2 more JARs for just doing Ajax. Anyway, deploying this application gave the following error: Caused by: java.lang.NoSuchMethodError: org.codehaus.jackson.map.SerializationConfig.<init>(Lorg/codehaus/jackson/map/ClassIntrospector;Lorg/codehaus/jackson/map/AnnotationIntrospector;Lorg/codehaus/jackson/map/introspect/VisibilityChecker;Lorg/codehaus/jackson/map/jsontype/SubtypeResolver;)V at org.springframework.samples.mvc.ajax.json.ConversionServiceAwareObjectMapper.<init>(ConversionServiceAwareObjectMapper.java:20) at org.springframework.samples.mvc.ajax.json.JacksonConversionServiceConfigurer.postProcessAfterInitialization(JacksonConversionServiceConfigurer.java:40) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.applyBeanPostProcessorsAfterInitialization(AbstractAutowireCapableBeanFactory.java:407) Seems like some incorrect repos in the "pom.xml". Next one is "mvc-showcase" and the 6.49 MB WAR now has 28 JARs as shown below: ./WEB-INF/lib/aopalliance-1.0.jar./WEB-INF/lib/aspectjrt-1.6.10.jar./WEB-INF/lib/commons-fileupload-1.2.2.jar./WEB-INF/lib/commons-io-2.0.1.jar./WEB-INF/lib/el-api-2.2.jar./WEB-INF/lib/hibernate-validator-4.1.0.Final.jar./WEB-INF/lib/jackson-core-asl-1.8.1.jar./WEB-INF/lib/jackson-mapper-asl-1.8.1.jar./WEB-INF/lib/javax.inject-1.jar./WEB-INF/lib/jcl-over-slf4j-1.6.1.jar./WEB-INF/lib/jdom-1.0.jar./WEB-INF/lib/joda-time-1.6.2.jar./WEB-INF/lib/jstl-api-1.2.jar./WEB-INF/lib/jstl-impl-1.2.jar./WEB-INF/lib/log4j-1.2.16.jar./WEB-INF/lib/rome-1.0.0.jar./WEB-INF/lib/slf4j-api-1.6.1.jar./WEB-INF/lib/slf4j-log4j12-1.6.1.jar./WEB-INF/lib/spring-aop-3.1.0.RELEASE.jar./WEB-INF/lib/spring-asm-3.1.0.RELEASE.jar./WEB-INF/lib/spring-beans-3.1.0.RELEASE.jar./WEB-INF/lib/spring-context-3.1.0.RELEASE.jar./WEB-INF/lib/spring-context-support-3.1.0.RELEASE.jar./WEB-INF/lib/spring-core-3.1.0.RELEASE.jar./WEB-INF/lib/spring-expression-3.1.0.RELEASE.jar./WEB-INF/lib/spring-web-3.1.0.RELEASE.jar./WEB-INF/lib/spring-webmvc-3.1.0.RELEASE.jar./WEB-INF/lib/validation-api-1.0.0.GA.jar The app at least deployed and showed results this time. But still no database! Next I tried building "jpetstore" and got the error: [ERROR] Failed to execute goal on project org.springframework.samples.jpetstore:Could not resolve dependencies for project org.springframework.samples:org.springframework.samples.jpetstore:war:1.0.0-SNAPSHOT: Failed to collect dependencies for [commons-fileupload:commons-fileupload:jar:1.2.1 (compile), org.apache.struts:com.springsource.org.apache.struts:jar:1.2.9 (compile), javax.xml.rpc:com.springsource.javax.xml.rpc:jar:1.1.0 (compile), org.apache.commons:com.springsource.org.apache.commons.dbcp:jar:1.2.2.osgi (compile), commons-io:commons-io:jar:1.3.2 (compile), hsqldb:hsqldb:jar:1.8.0.7 (compile), org.apache.tiles:tiles-core:jar:2.2.0 (compile), org.apache.tiles:tiles-jsp:jar:2.2.0 (compile), org.tuckey:urlrewritefilter:jar:3.1.0 (compile), org.springframework:spring-webmvc:jar:3.0.0.BUILD-SNAPSHOT (compile), org.springframework:spring-orm:jar:3.0.0.BUILD-SNAPSHOT (compile), org.springframework:spring-context-support:jar:3.0.0.BUILD-SNAPSHOT (compile), org.springframework.webflow:spring-js:jar:2.0.7.RELEASE (compile), org.apache.ibatis:com.springsource.com.ibatis:jar:2.3.4.726 (runtime), com.caucho:com.springsource.com.caucho:jar:3.2.1 (compile), org.apache.axis:com.springsource.org.apache.axis:jar:1.4.0 (compile), javax.wsdl:com.springsource.javax.wsdl:jar:1.6.1 (compile), javax.servlet:jstl:jar:1.2 (runtime), org.aspectj:aspectjweaver:jar:1.6.5 (compile), javax.servlet:servlet-api:jar:2.5 (provided), javax.servlet.jsp:jsp-api:jar:2.1 (provided), junit:junit:jar:4.6 (test)]: Failed to read artifact descriptor for org.springframework:spring-webmvc:jar:3.0.0.BUILD-SNAPSHOT: Could not transfer artifact org.springframework:spring-webmvc:pom:3.0.0.BUILD-SNAPSHOT from/to JBoss repository (http://repository.jboss.com/maven2): Access denied to: http://repository.jboss.com/maven2/org/springframework/spring-webmvc/3.0.0.BUILD-SNAPSHOT/spring-webmvc-3.0.0.BUILD-SNAPSHOT.pom It appears the sample is broken - maybe I was pulling from the wrong repository - would be great if someone were to point me at a good target to use here. With a 50% hit on samples in this repository, I started searching through numerous blogs, most of which have either outdated information (using XML-heavy Spring 2.5), some piece of configuration (which is a typical "feature" of Spring) is missing, or too much complexity in the sample. I finally found this blog that worked like a charm. This blog creates a trivial Spring MVC 3 application using Hibernate and MySQL. This application performs CRUD operations on a single table in a database using typical Spring technologies. I downloaded the sample code from the blog, deployed it on GlassFish 3.1.2 and could CRUD the "person" entity. The source code for this application can be downloaded here. More details on the application statistics below. And then I built a similar CRUD application in Java EE 6 using NetBeans wizards in a couple of minutes. The source code for the application can be downloaded here and the WAR here. The Spring Source Tool Suite may also offer similar wizard-driven capabilities but this blog focus primarily on comparing the runtimes. The lack of STS tutorials was slightly disappointing as well. NetBeans however has tons of text-based and video tutorials and tons of material even by the community. One more bit on the download size of tools bundle ... NetBeans 7.1.1 "All" is 211 MB (which includes GlassFish and Tomcat) Spring Tool Suite 2.9.0 is 347 MB (~ 65% bigger) This blog is not about the tooling comparison so back to the Java EE 6 version of the application .... In order to run the Java EE version on GlassFish, copy the MySQL Connector/J to glassfish3/glassfish/domains/domain1/lib/ext directory and create a JDBC connection pool and JDBC resource as: ./bin/asadmin create-jdbc-connection-pool --datasourceclassname \\ com.mysql.jdbc.jdbc2.optional.MysqlDataSource --restype \\ javax.sql.DataSource --property \\ portNumber=3306:user=mysql:password=mysql:databaseName=mydatabase \\ myConnectionPool ./bin/asadmin create-jdbc-resource --connectionpoolid myConnectionPool jdbc/myDataSource I generated WARs for the two projects and the table below highlights some differences between them: Java EE 6 Spring WAR File Size 0.021030 MB 10.87 MB (~516x) Number of files 20 53 (> 2.5x) Bundled libraries 0 36 Total size of libraries 0 12.1 MB XML files 3 5 LoC in XML files 50 (11 + 15 + 24) 129 (27 + 46 + 16 + 11 + 19) (~ 2.5x) Total .properties files 1 Bundle.properties 2 spring.properties, log4j.properties Cold Deploy 5,339 ms 11,724 ms Second Deploy 481 ms 6,261 ms Third Deploy 528 ms 5,484 ms Fourth Deploy 484 ms 5,576 ms Runtime memory ~73 MB ~101 MB Some points worth highlighting from the table ... 516x WAR file, 10x deployment time - With 12.1 MB of libraries (for a very basic application) bundled in your application, the WAR file size and the deployment time will naturally go higher. The WAR file for Spring-based application is 516x bigger and the deployment time is double during the first deployment and ~ 10x during subsequent deployments. The Java EE 6 application is fully portable and will run on any Java EE 6 compliant application server. 36 libraries in the WAR - There are 14 Java EE 6 compliant application servers today. Each of those servers provide all the functionality like transactions, dependency injection, security, persistence, etc typically required of an enterprise or web application. There is no need to bundle 36 libraries worth 12.1 MB for a trivial CRUD application. These 14 compliant application servers provide all the functionality baked in. Now you can also deploy these libraries in the container but then you don't get the "portability" offered by Spring in that case. Does your typical Spring deployment actually do that ? 3x LoC in XML - The number of XML files is about 1.6x and the LoC is ~ 2.5x. So much XML seems circa 2003 when the Java language had no annotations. The XML files can be further reduced, e.g. faces-config.xml can be replaced without providing i18n, but I just want to compare stock applications. Memory usage - Both the applications were deployed on default GlassFish 3.1.2 installation and any additional memory consumed as part of deployment/access was attributed to the application. This is by no means scientific but at least provides an initial ballpark. This area definitely needs more investigation. Another table that compares typical Java EE 6 compliant application servers and the custom-stack created for a Spring application ... Java EE 6 Spring Web Container ? 53 MB (tcServer 2.6.3 Developer Edition) Security ? 12 MB (Spring Security 3.1.0) Persistence ? 6.3 MB (Hibernate 4.1.0, required) Dependency Injection ? 5.3 MB (Framework) Web Services ? 796 KB (Spring WS 2.0.4) Messaging ? 3.4 MB (RabbitMQ Server 2.7.1) 936 KB (Java client 936) OSGi ? 1.3 MB (Spring OSGi 1.2.1) GlassFish and WebLogic (starting at 33 MB) 83.3 MB There are differentiating factors on both the stacks. But most of the functionality like security, persistence, and dependency injection is baked in a Java EE 6 compliant application server but needs to be individually managed and patched for a Spring application. This very quickly leads to a "stack explosion". The Java EE 6 servers are tested extensively on a variety of platforms in different combinations whereas a Spring application developer is responsible for testing with different JDKs, Operating Systems, Versions, Patches, etc. Oracle has both the leading OSS lightweight server with GlassFish and the leading enterprise Java server with WebLogic Server, both Java EE 6 and both with lightweight deployment options. The Web Container offered as part of a Java EE 6 application server not only deploys your enterprise Java applications but also provide operational management, diagnostics, and mission-critical capabilities required by your applications. The Java EE 6 platform also introduced the Web Profile which is a subset of the specifications from the entire platform. It is targeted at developers of modern web applications offering a reasonably complete stack, composed of standard APIs, and is capable out-of-the-box of addressing the needs of a large class of Web applications. As your applications grow, the stack can grow to the full Java EE 6 platform. The GlassFish Server Web Profile starting at 33MB (smaller than just the non-standard tcServer) provides most of the functionality typically required by a web application. WebLogic provides battle-tested functionality for a high throughput, low latency, and enterprise grade web application. No individual managing or patching, all tested and commercially supported for you! Note that VMWare does have a server, tcServer, but it is non-standard and not even certified to the level of the standard Web Profile most customers expect these days. Customers who choose this risk proprietary lock-in since VMWare does not seem to want to formally certify with either Java EE 6 Enterprise Platform or with Java EE 6 Web Profile but of course it would be great if they were to join the community and help their customers reduce the risk of deploying on VMWare software. Some more points to help you decide choose between Java EE 6 and Spring ... Freedom to choose container - There are 14 Java EE 6 compliant application servers today, with a variety of open source and commercial offerings. A Java EE 6 application can be deployed on any of those containers. So if you deployed your application on GlassFish today and would like to scale up with your demands then you can deploy the same application to WebLogic. And because of the portability of a Java EE 6 application, you can even take it a different vendor altogether. Spring requires a runtime which could be any of these app servers as well. But why use Spring when all the required functionality is already baked into the application server itself ? Spring also has a different definition of portability where they claim to bundle all the libraries in the WAR file and move to any application server. But we saw earlier how bloated that archive could be. The equivalent features in Spring runtime offerings (mainly tcServer) are not all open source, not as mature, and often require manual assembly. Vendor choice - The Java EE 6 platform is created using the Java Community Process where all the big players like Oracle, IBM, RedHat, and Apache are conritbuting to make the platform successful. Each application server provides the basic Java EE 6 platform compliance and has its own competitive offerings. This allows you to choose an application server for deploying your Java EE 6 applications. If you are not happy with the support or feature of one vendor then you can move your application to a different vendor because of the portability promise offered by the platform. Spring is a set of products from a single company, one price book, one support organization, one sustaining organization, one sales organization, etc. If any of those cause a customer headache, where do you go ? Java EE, backed by multiple vendors, is a safer bet for those that are risk averse. Production support - With Spring, typically you need to get support from two vendors - VMWare and the container provider. With Java EE 6, all of this is typically provided by one vendor. For example, Oracle offers commercial support from systems, operating systems, JDK, application server, and applications on top of them. VMWare certainly offers complete production support but do you really want to put all your eggs in one basket ? Do you really use tcServer ? ;-) Maintainability - With Spring, you are likely building your own distribution with multiple JAR files, integrating, patching, versioning, etc of all those components. Spring's claim is that multiple JAR files allow you to go à la carte and pick the latest versions of different components. But who is responsible for testing whether all these versions work together ? Yep, you got it, its YOU! If something does not work, who patches and maintains the JARs ? Of course, you! Commercial support for such a configuration ? On your own! The Java EE application servers manage all of this for you and provide a well-tested and commercially supported bundle. While it is always good to realize that there is something new and improved that updates and replaces older frameworks like Spring, the good news is not only does a Java EE 6 container offer what is described here, most also will let you deploy and run your Spring applications on them while you go through an upgrade to a more modern architecture. End result, you get the best of both worlds - keeping your legacy investment but moving to a more agile, lightweight world of Java EE 6. A message to the Spring lovers ... The complexity in J2EE 1.2, 1.3, and 1.4 led to the genesis of Spring but that was in 2004. This is 2012 and the name has changed to "Java EE 6" :-) There are tons of improvements in the Java EE platform to make it easy-to-use and powerful. Some examples: Adding @Stateless on a POJO makes it an EJB EJBs can be packaged in a WAR with no special packaging or deployment descriptors "web.xml" and "faces-config.xml" are optional in most of the common cases Typesafe dependency injection is now part of the Java EE platform Add @Path on a POJO allows you to publish it as a RESTful resource EJBs can be used as backing beans for Facelets-driven JSF pages providing full MVC Java EE 6 WARs are known to be kilobytes in size and deployed in milliseconds Tons of other simplifications in the platform and application servers So if you moved away from J2EE to Spring many years ago and have not looked at Java EE 6 (which has been out since Dec 2009) then you should definitely try it out. Just be at least aware of what other alternatives are available instead of restricting yourself to one stack. Here are some workshops and screencasts worth trying: screencast #37 shows how to build an end-to-end application using NetBeans screencast #36 builds the same application using Eclipse javaee-lab-feb2012.pdf is a 3-4 hours self-paced hands-on workshop that guides you to build a comprehensive Java EE 6 application using NetBeans Each city generally has a "spring cleanup" program every year. It allows you to clean up the mess from your house. For your software projects, you don't need to wait for an annual event, just get started and reduce the technical debt now! Move away from your legacy Spring-based applications to a lighter and more modern approach of building enterprise Java applications using Java EE 6. Watch this beautiful presentation that explains how to migrate from Spring -> Java EE 6: List of files in the Java EE 6 project: ./index.xhtml./META-INF./person./person/Create.xhtml./person/Edit.xhtml./person/List.xhtml./person/View.xhtml./resources./resources/css./resources/css/jsfcrud.css./template.xhtml./WEB-INF./WEB-INF/classes./WEB-INF/classes/Bundle.properties./WEB-INF/classes/META-INF./WEB-INF/classes/META-INF/persistence.xml./WEB-INF/classes/org./WEB-INF/classes/org/javaee./WEB-INF/classes/org/javaee/javaeemysql./WEB-INF/classes/org/javaee/javaeemysql/AbstractFacade.class./WEB-INF/classes/org/javaee/javaeemysql/Person.class./WEB-INF/classes/org/javaee/javaeemysql/Person_.class./WEB-INF/classes/org/javaee/javaeemysql/PersonController$1.class./WEB-INF/classes/org/javaee/javaeemysql/PersonController$PersonControllerConverter.class./WEB-INF/classes/org/javaee/javaeemysql/PersonController.class./WEB-INF/classes/org/javaee/javaeemysql/PersonFacade.class./WEB-INF/classes/org/javaee/javaeemysql/util./WEB-INF/classes/org/javaee/javaeemysql/util/JsfUtil.class./WEB-INF/classes/org/javaee/javaeemysql/util/PaginationHelper.class./WEB-INF/faces-config.xml./WEB-INF/web.xml List of files in the Spring 3.x project: ./META-INF ./META-INF/MANIFEST.MF./WEB-INF./WEB-INF/applicationContext.xml./WEB-INF/classes./WEB-INF/classes/log4j.properties./WEB-INF/classes/org./WEB-INF/classes/org/krams ./WEB-INF/classes/org/krams/tutorial ./WEB-INF/classes/org/krams/tutorial/controller ./WEB-INF/classes/org/krams/tutorial/controller/MainController.class ./WEB-INF/classes/org/krams/tutorial/domain ./WEB-INF/classes/org/krams/tutorial/domain/Person.class ./WEB-INF/classes/org/krams/tutorial/service ./WEB-INF/classes/org/krams/tutorial/service/PersonService.class ./WEB-INF/hibernate-context.xml ./WEB-INF/hibernate.cfg.xml ./WEB-INF/jsp ./WEB-INF/jsp/addedpage.jsp ./WEB-INF/jsp/addpage.jsp ./WEB-INF/jsp/deletedpage.jsp ./WEB-INF/jsp/editedpage.jsp ./WEB-INF/jsp/editpage.jsp ./WEB-INF/jsp/personspage.jsp ./WEB-INF/lib ./WEB-INF/lib/antlr-2.7.6.jar ./WEB-INF/lib/aopalliance-1.0.jar ./WEB-INF/lib/c3p0-0.9.1.2.jar ./WEB-INF/lib/cglib-nodep-2.2.jar ./WEB-INF/lib/commons-beanutils-1.8.3.jar ./WEB-INF/lib/commons-collections-3.2.1.jar ./WEB-INF/lib/commons-digester-2.1.jar ./WEB-INF/lib/commons-logging-1.1.1.jar ./WEB-INF/lib/dom4j-1.6.1.jar ./WEB-INF/lib/ejb3-persistence-1.0.2.GA.jar ./WEB-INF/lib/hibernate-annotations-3.4.0.GA.jar ./WEB-INF/lib/hibernate-commons-annotations-3.1.0.GA.jar ./WEB-INF/lib/hibernate-core-3.3.2.GA.jar ./WEB-INF/lib/javassist-3.7.ga.jar ./WEB-INF/lib/jstl-1.1.2.jar ./WEB-INF/lib/jta-1.1.jar ./WEB-INF/lib/junit-4.8.1.jar ./WEB-INF/lib/log4j-1.2.14.jar ./WEB-INF/lib/mysql-connector-java-5.1.14.jar ./WEB-INF/lib/persistence-api-1.0.jar ./WEB-INF/lib/slf4j-api-1.6.1.jar ./WEB-INF/lib/slf4j-log4j12-1.6.1.jar ./WEB-INF/lib/spring-aop-3.0.5.RELEASE.jar ./WEB-INF/lib/spring-asm-3.0.5.RELEASE.jar ./WEB-INF/lib/spring-beans-3.0.5.RELEASE.jar ./WEB-INF/lib/spring-context-3.0.5.RELEASE.jar ./WEB-INF/lib/spring-context-support-3.0.5.RELEASE.jar ./WEB-INF/lib/spring-core-3.0.5.RELEASE.jar ./WEB-INF/lib/spring-expression-3.0.5.RELEASE.jar ./WEB-INF/lib/spring-jdbc-3.0.5.RELEASE.jar ./WEB-INF/lib/spring-orm-3.0.5.RELEASE.jar ./WEB-INF/lib/spring-tx-3.0.5.RELEASE.jar ./WEB-INF/lib/spring-web-3.0.5.RELEASE.jar ./WEB-INF/lib/spring-webmvc-3.0.5.RELEASE.jar ./WEB-INF/lib/standard-1.1.2.jar ./WEB-INF/lib/xml-apis-1.0.b2.jar ./WEB-INF/spring-servlet.xml ./WEB-INF/spring.properties ./WEB-INF/web.xml So, are you excited about Java EE 6 ? Want to get started now ? Here are some resources: Java EE 6 SDK (including runtime, samples, tutorials etc) GlassFish Server Open Source Edition 3.1.2 (Community) Oracle GlassFish Server 3.1.2 (Commercial) Java EE 6 using WebLogic 12c and NetBeans (Video) Java EE 6 with NetBeans and GlassFish (Video) Java EE with Eclipse and GlassFish (Video)

Read the article
NoSQL with RavenDB and ASP.NET MVC - Part 1

- by shiju

A while back, I have blogged NoSQL with MongoDB, NoRM and ASP.NET MVC Part 1 and Part 2 on how to use MongoDB with an ASP.NET MVC application. The NoSQL movement is getting big attention and RavenDB is the latest addition to the NoSQL and document database world. RavenDB is an Open Source (with a commercial option) document database for the .NET/Windows platform developed by Ayende Rahien. Raven stores schema-less JSON documents, allow you to define indexes using Linq queries and focus on low latency and high performance. RavenDB is .NET focused document database which comes with a fully functional .NET client API and supports LINQ. RavenDB comes with two components, a server and a client API. RavenDB is a REST based system, so you can write your own HTTP cleint API. As a .NET developer, RavenDB is becoming my favorite document database. Unlike other document databases, RavenDB is supports transactions using System.Transactions. Also it's supports both embedded and server mode of database. You can access RavenDB site at http://ravendb.netA demo App with ASP.NET MVCLet's create a simple demo app with RavenDB and ASP.NET MVC. To work with RavenDB, do the following steps. Go to http://ravendb.net/download and download the latest build.Unzip the downloaded file.Go to the /Server directory and run the RavenDB.exe. This will start the RavenDB server listening on localhost:8080You can change the port of RavenDB by modifying the "Raven/Port" appSetting value in the RavenDB.exe.config file.When running the RavenDB, it will automatically create a database in the /Data directory. You can change the directory name data by modifying "Raven/DataDirt" appSetting value in the RavenDB.exe.config file.RavenDB provides a browser based admin tool. When the Raven server is running, You can be access the browser based admin tool and view and edit documents and index using your browser admin tool. The web admin tool available at http://localhost:8080The below is the some screen shots of web admin tool Working with ASP.NET MVC To working with RavenDB in our demo ASP.NET MVC application, do the following steps Step 1 - Add reference to Raven Cleint API In our ASP.NET MVC application, Add a reference to the Raven.Client.Lightweight.dll from the Client directory. Step 2 - Create DocumentStoreThe document store would be created once per application. Let's create a DocumentStore on application start-up in the Global.asax.cs. documentStore = new DocumentStore { Url = "http://localhost:8080/" }; documentStore.Initialise(); The above code will create a Raven DB document store and will be listening the server locahost at port 8080 Step 3 - Create DocumentSession on BeginRequest Let's create a DocumentSession on BeginRequest event in the Global.asax.cs. We are using the document session for every unit of work. In our demo app, every HTTP request would be a single Unit of Work (UoW). BeginRequest += (sender, args) => HttpContext.Current.Items[RavenSessionKey] = documentStore.OpenSession(); Step 4 - Destroy the DocumentSession on EndRequest EndRequest += (o, eventArgs) => { var disposable = HttpContext.Current.Items[RavenSessionKey] as IDisposable; if (disposable != null) disposable.Dispose(); }; At the end of HTTP request, we are destroying the DocumentSession object.The below code block shown all the code in the Global.asax.cs private const string RavenSessionKey = "RavenMVC.Session"; private static DocumentStore documentStore; protected void Application_Start() { //Create a DocumentStore in Application_Start //DocumentStore should be created once per application and stored as a singleton. documentStore = new DocumentStore { Url = "http://localhost:8080/" }; documentStore.Initialise(); AreaRegistration.RegisterAllAreas(); RegisterRoutes(RouteTable.Routes); //DI using Unity 2.0 ConfigureUnity(); } public MvcApplication() { //Create a DocumentSession on BeginRequest //create a document session for every unit of work BeginRequest += (sender, args) => HttpContext.Current.Items[RavenSessionKey] = documentStore.OpenSession(); //Destroy the DocumentSession on EndRequest EndRequest += (o, eventArgs) => { var disposable = HttpContext.Current.Items[RavenSessionKey] as IDisposable; if (disposable != null) disposable.Dispose(); }; } //Getting the current DocumentSession public static IDocumentSession CurrentSession { get { return (IDocumentSession)HttpContext.Current.Items[RavenSessionKey]; } } We have setup all necessary code in the Global.asax.cs for working with RavenDB. For our demo app, Let’s write a domain class public class Category { public string Id { get; set; } [Required(ErrorMessage = "Name Required")] [StringLength(25, ErrorMessage = "Must be less than 25 characters")] public string Name { get; set;} public string Description { get; set; } } We have created simple domain entity Category. Let's create repository class for performing CRUD operations against our domain entity Category. public interface ICategoryRepository { Category Load(string id); IEnumerable<Category> GetCategories(); void Save(Category category); void Delete(string id); } public class CategoryRepository : ICategoryRepository { private IDocumentSession session; public CategoryRepository() { session = MvcApplication.CurrentSession; } //Load category based on Id public Category Load(string id) { return session.Load<Category>(id); } //Get all categories public IEnumerable<Category> GetCategories() { var categories= session.LuceneQuery<Category>() .WaitForNonStaleResults() .ToArray(); return categories; } //Insert/Update category public void Save(Category category) { if (string.IsNullOrEmpty(category.Id)) { //insert new record session.Store(category); } else { //edit record var categoryToEdit = Load(category.Id); categoryToEdit.Name = category.Name; categoryToEdit.Description = category.Description; } //save the document session session.SaveChanges(); } //delete a category public void Delete(string id) { var category = Load(id); session.Delete<Category>(category); session.SaveChanges(); } } For every CRUD operations, we are taking the current document session object from HttpContext object. session = MvcApplication.CurrentSession; We are calling the static method CurrentSession from the Global.asax.cs public static IDocumentSession CurrentSession { get { return (IDocumentSession)HttpContext.Current.Items[RavenSessionKey]; } } Retrieve Entities The Load method get the single Category object based on the Id. RavenDB is working based on the REST principles and the Id would be like categories/1. The Id would be created by automatically when a new object is inserted to the document store. The REST uri categories/1 represents a single category object with Id representation of 1. public Category Load(string id) { return session.Load<Category>(id); } The GetCategories method returns all the categories calling the session.LuceneQuery method. RavenDB is using a lucen query syntax for querying. I will explain more details about querying and indexing in my future posts. public IEnumerable<Category> GetCategories() { var categories= session.LuceneQuery<Category>() .WaitForNonStaleResults() .ToArray(); return categories; } Insert/Update entityFor insert/Update a Category entity, we have created Save method in repository class. If the Id property of Category is null, we call Store method of Documentsession for insert a new record. For editing a existing record, we load the Category object and assign the values to the loaded Category object. The session.SaveChanges() will save the changes to document store. //Insert/Update category public void Save(Category category) { if (string.IsNullOrEmpty(category.Id)) { //insert new record session.Store(category); } else { //edit record var categoryToEdit = Load(category.Id); categoryToEdit.Name = category.Name; categoryToEdit.Description = category.Description; } //save the document session session.SaveChanges(); } Delete Entity In the Delete method, we call the document session's delete method and call the SaveChanges method to reflect changes in the document store. public void Delete(string id) { var category = Load(id); session.Delete<Category>(category); session.SaveChanges(); } Let’s create ASP.NET MVC controller and controller actions for handling CRUD operations for the domain class Category public class CategoryController : Controller { private ICategoryRepository categoyRepository; //DI enabled constructor public CategoryController(ICategoryRepository categoyRepository) { this.categoyRepository = categoyRepository; } public ActionResult Index() { var categories = categoyRepository.GetCategories(); if (categories == null) return RedirectToAction("Create"); return View(categories); } [HttpGet] public ActionResult Edit(string id) { var category = categoyRepository.Load(id); return View("Save",category); } // GET: /Category/Create [HttpGet] public ActionResult Create() { var category = new Category(); return View("Save", category); } [HttpPost] public ActionResult Save(Category category) { if (!ModelState.IsValid) { return View("Save", category); } categoyRepository.Save(category); return RedirectToAction("Index"); } [HttpPost] public ActionResult Delete(string id) { categoyRepository.Delete(id); var categories = categoyRepository.GetCategories(); return PartialView("CategoryList", categories); } } RavenDB is an awesome document database and I hope that it will be the winner in .NET space of document database world. The source code of demo application available at http://ravenmvc.codeplex.com/

Read the article
WCF WS-Security and WSE Nonce Authentication

- by Rick Strahl

WCF makes it fairly easy to access WS-* Web Services, except when you run into a service format that it doesn't support. Even then WCF provides a huge amount of flexibility to make the service clients work, however finding the proper interfaces to make that happen is not easy to discover and for the most part undocumented unless you're lucky enough to run into a blog, forum or StackOverflow post on the matter. This is definitely true for the Password Nonce as part of the WS-Security/WSE protocol, which is not natively supported in WCF. Specifically I had a need to create a WCF message on the client that includes a WS-Security header that looks like this from their spec document:<soapenv:Header> <wsse:Security soapenv:mustUnderstand="1" xmlns:wsse="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd"> <wsse:UsernameToken wsu:Id="UsernameToken-8" xmlns:wsu="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd"> <wsse:Username>TeStUsErNaMe1</wsse:Username> <wsse:Password Type="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-username-token-profile-1.0#PasswordText" >TeStPaSsWoRd1</wsse:Password> <wsse:Nonce EncodingType="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-soap-message-security-1.0#Base64Binary" >f8nUe3YupTU5ISdCy3X9Gg==</wsse:Nonce> <wsu:Created>2011-05-04T19:01:40.981Z</wsu:Created> </wsse:UsernameToken> </wsse:Security> </soapenv:Header> Specifically, the Nonce and Created keys are what WCF doesn't create or have a built in formatting for. Why is there a nonce? My first thought here was WTF? The username and password are there in clear text, what does the Nonce accomplish? The Nonce and created keys are are part of WSE Security specification and are meant to allow the server to detect and prevent replay attacks. The hashed nonce should be unique per request which the server can store and check for before running another request thus ensuring that a request is not replayed with exactly the same values. Basic ServiceUtl Import - not much Luck The first thing I did when I imported this service with a service reference was to simply import it as a Service Reference. The Add Service Reference import automatically detects that WS-Security is required and appropariately adds the WS-Security to the basicHttpBinding in the config file:<?xml version="1.0" encoding="utf-8" ?> <configuration> <system.serviceModel> <bindings> <basicHttpBinding> <binding name="RealTimeOnlineSoapBinding"> <security mode="Transport" /> </binding> <binding name="RealTimeOnlineSoapBinding1" /> </basicHttpBinding> </bindings> <client> <endpoint address="https://notarealurl.com:443/services/RealTimeOnline" binding="basicHttpBinding" bindingConfiguration="RealTimeOnlineSoapBinding" contract="RealTimeOnline.RealTimeOnline" name="RealTimeOnline" /> </client> </system.serviceModel> </configuration> If if I run this as is using code like this:var client = new RealTimeOnlineClient(); client.ClientCredentials.UserName.UserName = "TheUsername"; client.ClientCredentials.UserName.Password = "ThePassword"; … I get nothing in terms of WS-Security headers. The request is sent, but the the binding expects transport level security to be applied, rather than message level security. To fix this so that a WS-Security message header is sent the security mode can be changed to: <security mode="TransportWithMessageCredential" /> Now if I re-run I at least get a WS-Security header which looks like this:<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/" xmlns:u="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd"> <s:Header> <o:Security s:mustUnderstand="1" xmlns:o="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd"> <u:Timestamp u:Id="_0"> <u:Created>2012-11-24T02:55:18.011Z</u:Created> <u:Expires>2012-11-24T03:00:18.011Z</u:Expires> </u:Timestamp> <o:UsernameToken u:Id="uuid-18c215d4-1106-40a5-8dd1-c81fdddf19d3-1"> <o:Username>TheUserName</o:Username> <o:Password Type="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-username-token-profile-1.0#PasswordText" >ThePassword</o:Password> </o:UsernameToken> </o:Security> </s:Header> Closer! Now the WS-Security header is there along with a timestamp field (which might not be accepted by some WS-Security expecting services), but there's no Nonce or created timestamp as required by my original service. Using a CustomBinding instead My next try was to go with a CustomBinding instead of basicHttpBinding as it allows a bit more control over the protocol and transport configurations for the binding. Specifically I can explicitly specify the message protocol(s) used. Using configuration file settings here's what the config file looks like:<?xml version="1.0"?> <configuration> <system.serviceModel> <bindings> <customBinding> <binding name="CustomSoapBinding"> <security includeTimestamp="false" authenticationMode="UserNameOverTransport" defaultAlgorithmSuite="Basic256" requireDerivedKeys="false" messageSecurityVersion="WSSecurity10WSTrustFebruary2005WSSecureConversationFebruary2005WSSecurityPolicy11BasicSecurityProfile10"> </security> <textMessageEncoding messageVersion="Soap11"></textMessageEncoding> <httpsTransport maxReceivedMessageSize="2000000000"/> </binding> </customBinding> </bindings> <client> <endpoint address="https://notrealurl.com:443/services/RealTimeOnline" binding="customBinding" bindingConfiguration="CustomSoapBinding" contract="RealTimeOnline.RealTimeOnline" name="RealTimeOnline" /> </client> </system.serviceModel> <startup> <supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.0"/> </startup> </configuration> This ends up creating a cleaner header that's missing the timestamp field which can cause some services problems. The WS-Security header output generated with the above looks like this:<s:Header> <o:Security s:mustUnderstand="1" xmlns:o="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd"> <o:UsernameToken u:Id="uuid-291622ca-4c11-460f-9886-ac1c78813b24-1"> <o:Username>TheUsername</o:Username> <o:Password Type="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-username-token-profile-1.0#PasswordText" >ThePassword</o:Password> </o:UsernameToken> </o:Security> </s:Header> This is closer as it includes only the username and password. The key here is the protocol for WS-Security:messageSecurityVersion="WSSecurity10WSTrustFebruary2005WSSecureConversationFebruary2005WSSecurityPolicy11BasicSecurityProfile10" which explicitly specifies the protocol version. There are several variants of this specification but none of them seem to support the nonce unfortunately. This protocol does allow for optional omission of the Nonce and created timestamp provided (which effectively makes those keys optional). With some services I tried that requested a Nonce just using this protocol actually worked where the default basicHttpBinding failed to connect, so this is a possible solution for access to some services. Unfortunately for my target service that was not an option. The nonce has to be there. Creating Custom ClientCredentials As it turns out WCF doesn't have support for the Digest Nonce as part of WS-Security, and so as far as I can tell there's no way to do it just with configuration settings. I did a bunch of research on this trying to find workarounds for this, and I did find a couple of entries on StackOverflow as well as on the MSDN forums. However, none of these are particularily clear and I ended up using bits and pieces of several of them to arrive at a working solution in the end. http://stackoverflow.com/questions/896901/wcf-adding-nonce-to-usernametoken http://social.msdn.microsoft.com/Forums/en-US/wcf/thread/4df3354f-0627-42d9-b5fb-6e880b60f8ee The latter forum message is the more useful of the two (the last message on the thread in particular) and it has most of the information required to make this work. But it took some experimentation for me to get this right so I'll recount the process here maybe a bit more comprehensively. In order for this to work a number of classes have to be overridden: ClientCredentials ClientCredentialsSecurityTokenManager WSSecurityTokenizer The idea is that we need to create a custom ClientCredential class to hold the custom properties so they can be set from the UI or via configuration settings. The TokenManager and Tokenizer are mainly required to allow the custom credentials class to flow through the WCF pipeline and eventually provide custom serialization. Here are the three classes required and their full implementations:public class CustomCredentials : ClientCredentials { public CustomCredentials() { } protected CustomCredentials(CustomCredentials cc) : base(cc) { } public override System.IdentityModel.Selectors.SecurityTokenManager CreateSecurityTokenManager() { return new CustomSecurityTokenManager(this); } protected override ClientCredentials CloneCore() { return new CustomCredentials(this); } } public class CustomSecurityTokenManager : ClientCredentialsSecurityTokenManager { public CustomSecurityTokenManager(CustomCredentials cred) : base(cred) { } public override System.IdentityModel.Selectors.SecurityTokenSerializer CreateSecurityTokenSerializer(System.IdentityModel.Selectors.SecurityTokenVersion version) { return new CustomTokenSerializer(System.ServiceModel.Security.SecurityVersion.WSSecurity11); } } public class CustomTokenSerializer : WSSecurityTokenSerializer { public CustomTokenSerializer(SecurityVersion sv) : base(sv) { } protected override void WriteTokenCore(System.Xml.XmlWriter writer, System.IdentityModel.Tokens.SecurityToken token) { UserNameSecurityToken userToken = token as UserNameSecurityToken; string tokennamespace = "o"; DateTime created = DateTime.Now; string createdStr = created.ToString("yyyy-MM-ddThh:mm:ss.fffZ"); // unique Nonce value - encode with SHA-1 for 'randomness' // in theory the nonce could just be the GUID by itself string phrase = Guid.NewGuid().ToString(); var nonce = GetSHA1String(phrase); // in this case password is plain text // for digest mode password needs to be encoded as: // PasswordAsDigest = Base64(SHA-1(Nonce + Created + Password)) // and profile needs to change to //string password = GetSHA1String(nonce + createdStr + userToken.Password); string password = userToken.Password; writer.WriteRaw(string.Format( "<{0}:UsernameToken u:Id=\"" + token.Id + "\" xmlns:u=\"http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd\">" + "<{0}:Username>" + userToken.UserName + "</{0}:Username>" + "<{0}:Password Type=\"http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-username-token-profile-1.0#PasswordText\">" + password + "</{0}:Password>" + "<{0}:Nonce EncodingType=\"http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-soap-message-security-1.0#Base64Binary\">" + nonce + "</{0}:Nonce>" + "<u:Created>" + createdStr + "</u:Created></{0}:UsernameToken>", tokennamespace)); } protected string GetSHA1String(string phrase) { SHA1CryptoServiceProvider sha1Hasher = new SHA1CryptoServiceProvider(); byte[] hashedDataBytes = sha1Hasher.ComputeHash(Encoding.UTF8.GetBytes(phrase)); return Convert.ToBase64String(hashedDataBytes); } } Realistically only the CustomTokenSerializer has any significant code in. The code there deals with actually serializing the custom credentials using low level XML semantics by writing output into an XML writer. I can't take credit for this code - most of the code comes from the MSDN forum post mentioned earlier - I made a few adjustments to simplify the nonce generation and also added some notes to allow for PasswordDigest generation. Per spec the nonce is nothing more than a unique value that's supposed to be 'random'. I'm thinking that this value can be any string that's unique and a GUID on its own probably would have sufficed. Comments on other posts that GUIDs can be potentially guessed are highly exaggerated to say the least IMHO. To satisfy even that aspect though I added the SHA1 encryption and binary decoding to give a more random value that would be impossible to 'guess'. The original example from the forum post used another level of encoding and decoding to string in between - but that really didn't accomplish anything but extra overhead. The header output generated from this looks like this:<s:Header> <o:Security s:mustUnderstand="1" xmlns:o="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd"> <o:UsernameToken u:Id="uuid-f43d8b0d-0ebb-482e-998d-f544401a3c91-1" xmlns:u="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd"> <o:Username>TheUsername</o:Username> <o:Password Type="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-username-token-profile-1.0#PasswordText">ThePassword</o:Password> <o:Nonce EncodingType="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-soap-message-security-1.0#Base64Binary" >PjVE24TC6HtdAnsf3U9c5WMsECY=</o:Nonce> <u:Created>2012-11-23T07:10:04.670Z</u:Created> </o:UsernameToken> </o:Security> </s:Header> which is exactly as it should be. Password Digest? In my case the password is passed in plain text over an SSL connection, so there's no digest required so I was done with the code above. Since I don't have a service handy that requires a password digest, I had no way of testing the code for the digest implementation, but here is how this is likely to work. If you need to pass a digest encoded password things are a little bit trickier. The password type namespace needs to change to: http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-username-token-profile-1.0#Digest and then the password value needs to be encoded. The format for password digest encoding is this: Base64(SHA-1(Nonce + Created + Password)) and it can be handled in the code above with this code (that's commented in the snippet above): string password = GetSHA1String(nonce + createdStr + userToken.Password); The entire WriteTokenCore method for digest code looks like this:protected override void WriteTokenCore(System.Xml.XmlWriter writer, System.IdentityModel.Tokens.SecurityToken token) { UserNameSecurityToken userToken = token as UserNameSecurityToken; string tokennamespace = "o"; DateTime created = DateTime.Now; string createdStr = created.ToString("yyyy-MM-ddThh:mm:ss.fffZ"); // unique Nonce value - encode with SHA-1 for 'randomness' // in theory the nonce could just be the GUID by itself string phrase = Guid.NewGuid().ToString(); var nonce = GetSHA1String(phrase); string password = GetSHA1String(nonce + createdStr + userToken.Password); writer.WriteRaw(string.Format( "<{0}:UsernameToken u:Id=\"" + token.Id + "\" xmlns:u=\"http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd\">" + "<{0}:Username>" + userToken.UserName + "</{0}:Username>" + "<{0}:Password Type=\"http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-username-token-profile-1.0#Digest\">" + password + "</{0}:Password>" + "<{0}:Nonce EncodingType=\"http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-soap-message-security-1.0#Base64Binary\">" + nonce + "</{0}:Nonce>" + "<u:Created>" + createdStr + "</u:Created></{0}:UsernameToken>", tokennamespace)); } I had no service to connect to to try out Digest auth - if you end up needing it and get it to work please drop a comment… How to use the custom Credentials The easiest way to use the custom credentials is to create the client in code. Here's a factory method I use to create an instance of my service client: public static RealTimeOnlineClient CreateRealTimeOnlineProxy(string url, string username, string password) { if (string.IsNullOrEmpty(url)) url = "https://notrealurl.com:443/cows/services/RealTimeOnline"; CustomBinding binding = new CustomBinding(); var security = TransportSecurityBindingElement.CreateUserNameOverTransportBindingElement(); security.IncludeTimestamp = false; security.DefaultAlgorithmSuite = SecurityAlgorithmSuite.Basic256; security.MessageSecurityVersion = MessageSecurityVersion.WSSecurity10WSTrustFebruary2005WSSecureConversationFebruary2005WSSecurityPolicy11BasicSecurityProfile10; var encoding = new TextMessageEncodingBindingElement(); encoding.MessageVersion = MessageVersion.Soap11; var transport = new HttpsTransportBindingElement(); transport.MaxReceivedMessageSize = 20000000; // 20 megs binding.Elements.Add(security); binding.Elements.Add(encoding); binding.Elements.Add(transport); RealTimeOnlineClient client = new RealTimeOnlineClient(binding, new EndpointAddress(url)); // to use full client credential with Nonce uncomment this code: // it looks like this might not be required - the service seems to work without it client.ChannelFactory.Endpoint.Behaviors.Remove<System.ServiceModel.Description.ClientCredentials>(); client.ChannelFactory.Endpoint.Behaviors.Add(new CustomCredentials()); client.ClientCredentials.UserName.UserName = username; client.ClientCredentials.UserName.Password = password; return client; } This returns a service client that's ready to call other service methods. The key item in this code is the ChannelFactory endpoint behavior modification that that first removes the original ClientCredentials and then adds the new one. The ClientCredentials property on the client is read only and this is the way it has to be added. Summary It's a bummer that WCF doesn't suport WSE Security authentication with nonce values out of the box. From reading the comments in posts/articles while I was trying to find a solution, I found that this feature was omitted by design as this protocol is considered unsecure. While I agree that plain text passwords are rarely a good idea even if they go over secured SSL connection as WSE Security does, there are unfortunately quite a few services (mosly Java services I suspect) that use this protocol. I've run into this twice now and trying to find a solution online I can see that this is not an isolated problem - many others seem to have struggled with this. It seems there are about a dozen questions about this on StackOverflow all with varying incomplete answers. Hopefully this post provides a little more coherent content in one place. Again I marvel at WCF and its breadth of support for protocol features it has in a single tool. And even when it can't handle something there are ways to get it working via extensibility. But at the same time I marvel at how freaking difficult it is to arrive at these solutions. I mean there's no way I could have ever figured this out on my own. It takes somebody working on the WCF team or at least being very, very intricately involved in the innards of WCF to figure out the interconnection of the various objects to do this from scratch. Luckily this is an older problem that has been discussed extensively online and I was able to cobble together a solution from the online content. I'm glad it worked out that way, but it feels dirty and incomplete in that there's a whole learning path that was omitted to get here… Man am I glad I'm not dealing with SOAP services much anymore. REST service security - even when using some sort of federation is a piece of cake by comparison :-) I'm sure once standards bodies gets involved we'll be right back in security standard hell…© Rick Strahl, West Wind Technologies, 2005-2012Posted in WCF Web Services Tweet !function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0];if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src="//platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs"); (function() { var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = 'https://apis.google.com/js/plusone.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })();

Read the article
West Wind WebSurge - an easy way to Load Test Web Applications

- by Rick Strahl

A few months ago on a project the subject of load testing came up. We were having some serious issues with a Web application that would start spewing SQL lock errors under somewhat heavy load. These sort of errors can be tough to catch, precisely because they only occur under load and not during typical development testing. To replicate this error more reliably we needed to put a load on the application and run it for a while before these SQL errors would flare up. It’s been a while since I’d looked at load testing tools, so I spent a bit of time looking at different tools and frankly didn’t really find anything that was a good fit. A lot of tools were either a pain to use, didn’t have the basic features I needed, or are extravagantly expensive. In the end I got frustrated enough to build an initially small custom load test solution that then morphed into a more generic library, then gained a console front end and eventually turned into a full blown Web load testing tool that is now called West Wind WebSurge. I got seriously frustrated looking for tools every time I needed some quick and dirty load testing for an application. If my aim is to just put an application under heavy enough load to find a scalability problem in code, or to simply try and push an application to its limits on the hardware it’s running I shouldn’t have to have to struggle to set up tests. It should be easy enough to get going in a few minutes, so that the testing can be set up quickly so that it can be done on a regular basis without a lot of hassle. And that was the goal when I started to build out my initial custom load tester into a more widely usable tool. If you’re in a hurry and you want to check it out, you can find more information and download links here: West Wind WebSurge Product Page Walk through Video Download link (zip) Install from Chocolatey Source on GitHub For a more detailed discussion of the why’s and how’s and some background continue reading. How did I get here? When I started out on this path, I wasn’t planning on building a tool like this myself – but I got frustrated enough looking at what’s out there to think that I can do better than what’s available for the most common simple load testing scenarios. When we ran into the SQL lock problems I mentioned, I started looking around what’s available for Web load testing solutions that would work for our whole team which consisted of a few developers and a couple of IT guys both of which needed to be able to run the tests. It had been a while since I looked at tools and I figured that by now there should be some good solutions out there, but as it turns out I didn’t really find anything that fit our relatively simple needs without costing an arm and a leg… I spent the better part of a day installing and trying various load testing tools and to be frank most of them were either terrible at what they do, incredibly unfriendly to use, used some terminology I couldn’t even parse, or were extremely expensive (and I mean in the ‘sell your liver’ range of expensive). Pick your poison. There are also a number of online solutions for load testing and they actually looked more promising, but those wouldn’t work well for our scenario as the application is running inside of a private VPN with no outside access into the VPN. Most of those online solutions also ended up being very pricey as well – presumably because of the bandwidth required to test over the open Web can be enormous. When I asked around on Twitter what people were using– I got mostly… crickets. Several people mentioned Visual Studio Load Test, and most other suggestions pointed to online solutions. I did get a bunch of responses though with people asking to let them know what I found – apparently I’m not alone when it comes to finding load testing tools that are effective and easy to use. As to Visual Studio, the higher end skus of Visual Studio and the test edition include a Web load testing tool, which is quite powerful, but there are a number of issues with that: First it’s tied to Visual Studio so it’s not very portable – you need a VS install. I also find the test setup and terminology used by the VS test runner extremely confusing. Heck, it’s complicated enough that there’s even a Pluralsight course on using the Visual Studio Web test from Steve Smith. And of course you need to have one of the high end Visual Studio Skus, and those are mucho Dinero ($$$) – just for the load testing that’s rarely an option. Some of the tools are ultra extensive and let you run analysis tools on the target serves which is useful, but in most cases – just plain overkill and only distracts from what I tend to be ultimately interested in: Reproducing problems that occur at high load, and finding the upper limits and ‘what if’ scenarios as load is ramped up increasingly against a site. Yes it’s useful to have Web app instrumentation, but often that’s not what you’re interested in. I still fondly remember early days of Web testing when Microsoft had the WAST (Web Application Stress Tool) tool, which was rather simple – and also somewhat limited – but easily allowed you to create stress tests very quickly. It had some serious limitations (mainly that it didn’t work with SSL), but the idea behind it was excellent: Create tests quickly and easily and provide a decent engine to run it locally with minimal setup. You could get set up and run tests within a few minutes. Unfortunately, that tool died a quiet death as so many of Microsoft’s tools that probably were built by an intern and then abandoned, even though there was a lot of potential and it was actually fairly widely used. Eventually the tools was no longer downloadable and now it simply doesn’t work anymore on higher end hardware. West Wind Web Surge – Making Load Testing Quick and Easy So I ended up creating West Wind WebSurge out of rebellious frustration… The goal of WebSurge is to make it drop dead simple to create load tests. It’s super easy to capture sessions either using the built in capture tool (big props to Eric Lawrence, Telerik and FiddlerCore which made that piece a snap), using the full version of Fiddler and exporting sessions, or by manually or programmatically creating text files based on plain HTTP headers to create requests. I’ve been using this tool for 4 months now on a regular basis on various projects as a reality check for performance and scalability and it’s worked extremely well for finding small performance issues. I also use it regularly as a simple URL tester, as it allows me to quickly enter a URL plus headers and content and test that URL and its results along with the ability to easily save one or more of those URLs. A few weeks back I made a walk through video that goes over most of the features of WebSurge in some detail: Note that the UI has slightly changed since then, so there are some UI improvements. Most notably the test results screen has been updated recently to a different layout and to provide more information about each URL in a session at a glance. The video and the main WebSurge site has a lot of info of basic operations. For the rest of this post I’ll talk about a few deeper aspects that may be of interest while also giving a glance at how WebSurge works. Session Capturing As you would expect, WebSurge works with Sessions of Urls that are played back under load. Here’s what the main Session View looks like: You can create session entries manually by individually adding URLs to test (on the Request tab on the right) and saving them, or you can capture output from Web Browsers, Windows Desktop applications that call services, your own applications using the built in Capture tool. With this tool you can capture anything HTTP -SSL requests and content from Web pages, AJAX calls, SOAP or REST services – again anything that uses Windows or .NET HTTP APIs. Behind the scenes the capture tool uses FiddlerCore so basically anything you can capture with Fiddler you can also capture with Web Surge Session capture tool. Alternately you can actually use Fiddler as well, and then export the captured Fiddler trace to a file, which can then be imported into WebSurge. This is a nice way to let somebody capture session without having to actually install WebSurge or for your customers to provide an exact playback scenario for a given set of URLs that cause a problem perhaps. Note that not all applications work with Fiddler’s proxy unless you configure a proxy. For example, .NET Web applications that make HTTP calls usually don’t show up in Fiddler by default. For those .NET applications you can explicitly override proxy settings to capture those requests to service calls. The capture tool also has handy optional filters that allow you to filter by domain, to help block out noise that you typically don’t want to include in your requests. For example, if your pages include links to CDNs, or Google Analytics or social links you typically don’t want to include those in your load test, so by capturing just from a specific domain you are guaranteed content from only that one domain. Additionally you can provide url filters in the configuration file – filters allow to provide filter strings that if contained in a url will cause requests to be ignored. Again this is useful if you don’t filter by domain but you want to filter out things like static image, css and script files etc. Often you’re not interested in the load characteristics of these static and usually cached resources as they just add noise to tests and often skew the overall url performance results. In my testing I tend to care only about my dynamic requests. SSL Captures require Fiddler Note, that in order to capture SSL requests you’ll have to install the Fiddler’s SSL certificate. The easiest way to do this is to install Fiddler and use its SSL configuration options to get the certificate into the local certificate store. There’s a document on the Telerik site that provides the exact steps to get SSL captures to work with Fiddler and therefore with WebSurge. Session Storage A group of URLs entered or captured make up a Session. Sessions can be saved and restored easily as they use a very simple text format that simply stored on disk. The format is slightly customized HTTP header traces separated by a separator line. The headers are standard HTTP headers except that the full URL instead of just the domain relative path is stored as part of the 1st HTTP header line for easier parsing. Because it’s just text and uses the same format that Fiddler uses for exports, it’s super easy to create Sessions by hand manually or under program control writing out to a simple text file. You can see what this format looks like in the Capture window figure above – the raw captured format is also what’s stored to disk and what WebSurge parses from. The only ‘custom’ part of these headers is that 1st line contains the full URL instead of the domain relative path and Host: header. The rest of each header are just plain standard HTTP headers with each individual URL isolated by a separator line. The format used here also uses what Fiddler produces for exports, so it’s easy to exchange or view data either in Fiddler or WebSurge. Urls can also be edited interactively so you can modify the headers easily as well: Again – it’s just plain HTTP headers so anything you can do with HTTP can be added here. Use it for single URL Testing Incidentally I’ve also found this form as an excellent way to test and replay individual URLs for simple non-load testing purposes. Because you can capture a single or many URLs and store them on disk, this also provides a nice HTTP playground where you can record URLs with their headers, and fire them one at a time or as a session and see results immediately. It’s actually an easy way for REST presentations and I find the simple UI flow actually easier than using Fiddler natively. Finally you can save one or more URLs as a session for later retrieval. I’m using this more and more for simple URL checks. Overriding Cookies and Domains Speaking of HTTP headers – you can also overwrite cookies used as part of the options. One thing that happens with modern Web applications is that you have session cookies in use for authorization. These cookies tend to expire at some point which would invalidate a test. Using the Options dialog you can actually override the cookie: which replaces the cookie for all requests with the cookie value specified here. You can capture a valid cookie from a manual HTTP request in your browser and then paste into the cookie field, to replace the existing Cookie with the new one that is now valid. Likewise you can easily replace the domain so if you captured urls on west-wind.com and now you want to test on localhost you can do that easily easily as well. You could even do something like capture on store.west-wind.com and then test on localhost/store which would also work. Running Load Tests Once you’ve created a Session you can specify the length of the test in seconds, and specify the number of simultaneous threads to run each session on. Sessions run through each of the URLs in the session sequentially by default. One option in the options list above is that you can also randomize the URLs so each thread runs requests in a different order. This avoids bunching up URLs initially when tests start as all threads run the same requests simultaneously which can sometimes skew the results of the first few minutes of a test. While sessions run some progress information is displayed: By default there’s a live view of requests displayed in a Console-like window. On the bottom of the window there’s a running total summary that displays where you’re at in the test, how many requests have been processed and what the requests per second count is currently for all requests. Note that for tests that run over a thousand requests a second it’s a good idea to turn off the console display. While the console display is nice to see that something is happening and also gives you slight idea what’s happening with actual requests, once a lot of requests are processed, this UI updating actually adds a lot of CPU overhead to the application which may cause the actual load generated to be reduced. If you are running a 1000 requests a second there’s not much to see anyway as requests roll by way too fast to see individual lines anyway. If you look on the options panel, there is a NoProgressEvents option that disables the console display. Note that the summary display is still updated approximately once a second so you can always tell that the test is still running. Test Results When the test is done you get a simple Results display: On the right you get an overall summary as well as breakdown by each URL in the session. Both success and failures are highlighted so it’s easy to see what’s breaking in your load test. The report can be printed or you can also open the HTML document in your default Web Browser for printing to PDF or saving the HTML document to disk. The list on the right shows you a partial list of the URLs that were fired so you can look in detail at the request and response data. The list can be filtered by success and failure requests. Each list is partial only (at the moment) and limited to a max of 1000 items in order to render reasonably quickly. Each item in the list can be clicked to see the full request and response data: This particularly useful for errors so you can quickly see and copy what request data was used and in the case of a GET request you can also just click the link to quickly jump to the page. For non-GET requests you can find the URL in the Session list, and use the context menu to Test the URL as configured including any HTTP content data to send. You get to see the full HTTP request and response as well as a link in the Request header to go visit the actual page. Not so useful for a POST as above, but definitely useful for GET requests. Finally you can also get a few charts. The most useful one is probably the Request per Second chart which can be accessed from the Charts menu or shortcut. Here’s what it looks like: Results can also be exported to JSON, XML and HTML. Keep in mind that these files can get very large rather quickly though, so exports can end up taking a while to complete. Command Line Interface WebSurge runs with a small core load engine and this engine is plugged into the front end application I’ve shown so far. There’s also a command line interface available to run WebSurge from the Windows command prompt. Using the command line you can run tests for either an individual URL (similar to AB.exe for example) or a full Session file. By default when it runs WebSurgeCli shows progress every second showing total request count, failures and the requests per second for the entire test. A silent option can turn off this progress display and display only the results. The command line interface can be useful for build integration which allows checking for failures perhaps or hitting a specific requests per second count etc. It’s also nice to use this as quick and dirty URL test facility similar to the way you’d use Apache Bench (ab.exe). Unlike ab.exe though, WebSurgeCli supports SSL and makes it much easier to create multi-URL tests using either manual editing or the WebSurge UI. Current Status Currently West Wind WebSurge is still in Beta status. I’m still adding small new features and tweaking the UI in an attempt to make it as easy and self-explanatory as possible to run. Documentation for the UI and specialty features is also still a work in progress. I plan on open-sourcing this product, but it won’t be free. There’s a free version available that provides a limited number of threads and request URLs to run. A relatively low cost license removes the thread and request limitations. Pricing info can be found on the Web site – there’s an introductory price which is $99 at the moment which I think is reasonable compared to most other for pay solutions out there that are exorbitant by comparison… The reason code is not available yet is – well, the UI portion of the app is a bit embarrassing in its current monolithic state. The UI started as a very simple interface originally that later got a lot more complex – yeah, that never happens, right? Unless there’s a lot of interest I don’t foresee re-writing the UI entirely (which would be ideal), but in the meantime at least some cleanup is required before I dare to publish it :-). The code will likely be released with version 1.0. I’m very interested in feedback. Do you think this could be useful to you and provide value over other tools you may or may not have used before? I hope so – it already has provided a ton of value for me and the work I do that made the development worthwhile at this point. You can leave a comment below, or for more extensive discussions you can post a message on the West Wind Message Board in the WebSurge section Microsoft MVPs and Insiders get a free License If you’re a Microsoft MVP or a Microsoft Insider you can get a full license for free. Send me a link to your current, official Microsoft profile and I’ll send you a not-for resale license. Send any messages to [email protected]. Resources For more info on WebSurge and to download it to try it out, use the following links. West Wind WebSurge Home Download West Wind WebSurge Getting Started with West Wind WebSurge Video© Rick Strahl, West Wind Technologies, 2005-2014Posted in ASP.NET Tweet !function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0];if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src="//platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs"); (function() { var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = 'https://apis.google.com/js/plusone.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })();

Read the article
Top things web developers should know about the Visual Studio 2013 release

- by Jon Galloway

ASP.NET and Web Tools for Visual Studio 2013 Release NotesASP.NET and Web Tools for Visual Studio 2013 Release NotesSummary for lazy readers: Visual Studio 2013 is now available for download on the Visual Studio site and on MSDN subscriber downloads) Visual Studio 2013 installs side by side with Visual Studio 2012 and supports round-tripping between Visual Studio versions, so you can try it out without committing to a switch Visual Studio 2013 ships with the new version of ASP.NET, which includes ASP.NET MVC 5, ASP.NET Web API 2, Razor 3, Entity Framework 6 and SignalR 2.0 The new releases ASP.NET focuses on One ASP.NET, so core features and web tools work the same across the platform (e.g. adding ASP.NET MVC controllers to a Web Forms application) New core features include new templates based on Bootstrap, a new scaffolding system, and a new identity system Visual Studio 2013 is an incredible editor for web files, including HTML, CSS, JavaScript, Markdown, LESS, Coffeescript, Handlebars, Angular, Ember, Knockdown, etc. Top links: Visual Studio 2013 content on the ASP.NET site are in the standard new releases area: http://www.asp.net/vnext ASP.NET and Web Tools for Visual Studio 2013 Release Notes Short intro videos on the new Visual Studio web editor features from Scott Hanselman and Mads Kristensen Announcing release of ASP.NET and Web Tools for Visual Studio 2013 post on the official .NET Web Development and Tools Blog Scott Guthrie's post: Announcing the Release of Visual Studio 2013 and Great Improvements to ASP.NET and Entity Framework Okay, for those of you who are still with me, let's dig in a bit. Quick web dev notes on downloading and installing Visual Studio 2013 I found Visual Studio 2013 to be a pretty fast install. According to Brian Harry's release post, installing over pre-release versions of Visual Studio is supported. I've installed the release version over pre-release versions, and it worked fine. If you're only going to be doing web development, you can speed up the install if you just select Web Developer tools. Of course, as a good Microsoft employee, I'll mention that you might also want to install some of those other features, like the Store apps for Windows 8 and the Windows Phone 8.0 SDK, but they do download and install a lot of other stuff (e.g. the Windows Phone SDK sets up Hyper-V and downloads several GB's of VM's). So if you're planning just to do web development for now, you can pick just the Web Developer Tools and install the other stuff later. If you've got a fast internet connection, I recommend using the web installer instead of downloading the ISO. The ISO includes all the features, whereas the web installer just downloads what you're installing. Visual Studio 2013 development settings and color theme When you start up Visual Studio, it'll prompt you to pick some defaults. These are totally up to you -whatever suits your development style - and you can change them later. As I said, these are completely up to you. I recommend either the Web Development or Web Development (Code Only) settings. The only real difference is that Code Only hides the toolbars, and you can switch between them using Tools / Import and Export Settings / Reset. Web Development settings Web Development (code only) settings Usually I've just gone with Web Development (code only) in the past because I just want to focus on the code, although the Standard toolbar does make it easier to switch default web browsers. More on that later. Color theme Sigh. Okay, everyone's got their favorite colors. I alternate between Light and Dark depending on my mood, and I personally like how the low contrast on the window chrome in those themes puts the emphasis on my code rather than the tabs and toolbars. I know some people got pretty worked up over that, though, and wanted the blue theme back. I personally don't like it - it reminds me of ancient versions of Visual Studio that I don't want to think about anymore. So here's the thing: if you install Visual Studio Ultimate, it defaults to Blue. The other versions default to Light. If you use Blue, I won't criticize you - out loud, that is. You can change themes really easily - either Tools / Options / Environment / General, or the smart way: ctrl+q for quick launch, then type Theme and hit enter. Signing in During the first run, you'll be prompted to sign in. You don't have to - you can click the "Not now, maybe later" link at the bottom of that dialog. I recommend signing in, though. It's not hooked in with licensing or tracking the kind of code you write to sell you components. It is doing good things, like syncing your Visual Studio settings between computers. More about that here. So, you don't have to, but I sure do. Overview of shiny new things in ASP.NET land There are a lot of good new things in ASP.NET. I'll list some of my favorite here, but you can read more on the ASP.NET site. One ASP.NET You've heard us talk about this for a while. The idea is that options are good, but choice can be a burden. When you start a new ASP.NET project, why should you have to make a tough decision - with long-term consequences - about how your application will work? If you want to use ASP.NET Web Forms, but have the option of adding in ASP.NET MVC later, why should that be hard? It's all ASP.NET, right? Ideally, you'd just decide that you want to use ASP.NET to build sites and services, and you could use the appropriate tools (the green blocks below) as you needed them. So, here it is. When you create a new ASP.NET application, you just create an ASP.NET application. Next, you can pick from some templates to get you started... but these are different. They're not "painful decision" templates, they're just some starting pieces. And, most importantly, you can mix and match. I can pick a "mostly" Web Forms template, but include MVC and Web API folders and core references. If you've tried to mix and match in the past, you're probably aware that it was possible, but not pleasant. ASP.NET MVC project files contained special project type GUIDs, so you'd only get controller scaffolding support in a Web Forms project if you manually edited the csproj file. Features in one stack didn't work in others. Project templates were painful choices. That's no longer the case. Hooray! I just did a demo in a presentation last week where I created a new Web Forms + MVC + Web API site, built a model, scaffolded MVC and Web API controllers with EF Code First, add data in the MVC view, viewed it in Web API, then added a GridView to the Web Forms Default.aspx page and bound it to the Model. In about 5 minutes. Sure, it's a simple example, but it's great to be able to share code and features across the whole ASP.NET family. Authentication In the past, authentication was built into the templates. So, for instance, there was an ASP.NET MVC 4 Intranet Project template which created a new ASP.NET MVC 4 application that was preconfigured for Windows Authentication. All of that authentication stuff was built into each template, so they varied between the stacks, and you couldn't reuse them. You didn't see a lot of changes to the authentication options, since they required big changes to a bunch of project templates. Now, the new project dialog includes a common authentication experience. When you hit the Change Authentication button, you get some common options that work the same way regardless of the template or reference settings you've made. These options work on all ASP.NET frameworks, and all hosting environments (IIS, IIS Express, or OWIN for self-host) The default is Individual User Accounts: This is the standard "create a local account, using username / password or OAuth" thing; however, it's all built on the new Identity system. More on that in a second. The one setting that has some configuration to it is Organizational Accounts, which lets you configure authentication using Active Directory, Windows Azure Active Directory, or Office 365. Identity There's a new identity system. We've taken the best parts of the previous ASP.NET Membership and Simple Identity systems, rolled in a lot of feedback and made big enhancements to support important developer concerns like unit testing and extensiblity. I've written long posts about ASP.NET identity, and I'll do it again. Soon. This is not that post. The short version is that I think we've finally got just the right Identity system. Some of my favorite features: There are simple, sensible defaults that work well - you can File / New / Run / Register / Login, and everything works. It supports standard username / password as well as external authentication (OAuth, etc.). It's easy to customize without having to re-implement an entire provider. It's built using pluggable pieces, rather than one large monolithic system. It's built using interfaces like IUser and IRole that allow for unit testing, dependency injection, etc. You can easily add user profile data (e.g. URL, twitter handle, birthday). You just add properties to your ApplicationUser model and they'll automatically be persisted. Complete control over how the identity data is persisted. By default, everything works with Entity Framework Code First, but it's built to support changes from small (modify the schema) to big (use another ORM, store your data in a document database or in the cloud or in XML or in the EXIF data of your desktop background or whatever). It's configured via OWIN. More on OWIN and Katana later, but the fact that it's built using OWIN means it's portable. You can find out more in the Authentication and Identity section of the ASP.NET site (and lots more content will be going up there soon). New Bootstrap based project templates The new project templates are built using Bootstrap 3. Bootstrap (formerly Twitter Bootstrap) is a front-end framework that brings a lot of nice benefits: It's responsive, so your projects will automatically scale to device width using CSS media queries. For example, menus are full size on a desktop browser, but on narrower screens you automatically get a mobile-friendly menu. The built-in Bootstrap styles make your standard page elements (headers, footers, buttons, form inputs, tables etc.) look nice and modern. Bootstrap is themeable, so you can reskin your whole site by dropping in a new Bootstrap theme. Since Bootstrap is pretty popular across the web development community, this gives you a large and rapidly growing variety of templates (free and paid) to choose from. Bootstrap also includes a lot of very useful things: components (like progress bars and badges), useful glyphicons, and some jQuery plugins for tooltips, dropdowns, carousels, etc.). Here's a look at how the responsive part works. When the page is full screen, the menu and header are optimized for a wide screen display: When I shrink the page down (this is all based on page width, not useragent sniffing) the menu turns into a nice mobile-friendly dropdown: For a quick example, I grabbed a new free theme off bootswatch.com. For simple themes, you just need to download the boostrap.css file and replace the /content/bootstrap.css file in your project. Now when I refresh the page, I've got a new theme: Scaffolding The big change in scaffolding is that it's one system that works across ASP.NET. You can create a new Empty Web project or Web Forms project and you'll get the Scaffold context menus. For release, we've got MVC 5 and Web API 2 controllers. We had a preview of Web Forms scaffolding in the preview releases, but they weren't fully baked for RTM. Look for them in a future update, expected pretty soon. This scaffolding system wasn't just changed to work across the ASP.NET frameworks, it's also built to enable future extensibility. That's not in this release, but should also hopefully be out soon. Project Readme page This is a small thing, but I really like it. When you create a new project, you get a Project_Readme.html page that's added to the root of your project and opens in the Visual Studio built-in browser. I love it. A long time ago, when you created a new project we just dumped it on you and left you scratching your head about what to do next. Not ideal. Then we started adding a bunch of Getting Started information to the new project templates. That told you what to do next, but you had to delete all of that stuff out of your website. It doesn't belong there. Not ideal. This is a simple HTML file that's not integrated into your project code at all. You can delete it if you want. But, it shows a lot of helpful links that are current for the project you just created. In the future, if we add new wacky project types, they can create readme docs with specific information on how to do appropriately wacky things. Side note: I really like that they used the internal browser in Visual Studio to show this content rather than popping open an HTML page in the default browser. I hate that. It's annoying. If you're doing that, I hope you'll stop. What if some unnamed person has 40 or 90 tabs saved in their browser session? When you pop open your "Thanks for installing my Visual Studio extension!" page, all eleventy billion tabs start up and I wish I'd never installed your thing. Be like these guys and pop stuff Visual Studio specific HTML docs in the Visual Studio browser. ASP.NET MVC 5 The biggest change with ASP.NET MVC 5 is that it's no longer a separate project type. It integrates well with the rest of ASP.NET. In addition to that and the other common features we've already looked at (Bootstrap templates, Identity, authentication), here's what's new for ASP.NET MVC. Attribute routing ASP.NET MVC now supports attribute routing, thanks to a contribution by Tim McCall, the author of http://attributerouting.net. With attribute routing you can specify your routes by annotating your actions and controllers. This supports some pretty complex, customized routing scenarios, and it allows you to keep your route information right with your controller actions if you'd like. Here's a controller that includes an action whose method name is Hiding, but I've used AttributeRouting to configure it to /spaghetti/with-nesting/where-is-waldo public class SampleController : Controller { [Route("spaghetti/with-nesting/where-is-waldo")] public string Hiding() { return "You found me!"; } } I enable that in my RouteConfig.cs, and I can use that in conjunction with my other MVC routes like this: public class RouteConfig { public static void RegisterRoutes(RouteCollection routes) { routes.IgnoreRoute("{resource}.axd/{*pathInfo}"); routes.MapMvcAttributeRoutes(); routes.MapRoute( name: "Default", url: "{controller}/{action}/{id}", defaults: new { controller = "Home", action = "Index", id = UrlParameter.Optional } ); } } You can read more about Attribute Routing in ASP.NET MVC 5 here. Filter enhancements There are two new additions to filters: Authentication Filters and Filter Overrides. Authentication filters are a new kind of filter in ASP.NET MVC that run prior to authorization filters in the ASP.NET MVC pipeline and allow you to specify authentication logic per-action, per-controller, or globally for all controllers. Authentication filters process credentials in the request and provide a corresponding principal. Authentication filters can also add authentication challenges in response to unauthorized requests. Override filters let you change which filters apply to a given action method or controller. Override filters specify a set of filter types that should not be run for a given scope (action or controller). This allows you to configure filters that apply globally but then exclude certain global filters from applying to specific actions or controllers. ASP.NET Web API 2 ASP.NET Web API 2 includes a lot of new features. Attribute Routing ASP.NET Web API supports the same attribute routing system that's in ASP.NET MVC 5. You can read more about the Attribute Routing features in Web API in this article. OAuth 2.0 ASP.NET Web API picks up OAuth 2.0 support, using security middleware running on OWIN (discussed below). This is great for features like authenticated Single Page Applications. OData Improvements ASP.NET Web API now has full OData support. That required adding in some of the most powerful operators: $select, $expand, $batch and $value. You can read more about OData operator support in this article by Mike Wasson. Lots more There's a huge list of other features, including CORS (cross-origin request sharing), IHttpActionResult, IHttpRequestContext, and more. I think the best overview is in the release notes. OWIN and Katana I've written about OWIN and Katana recently. I'm a big fan. OWIN is the Open Web Interfaces for .NET. It's a spec, like HTML or HTTP, so you can't install OWIN. The benefit of OWIN is that it's a community specification, so anyone who implements it can plug into the ASP.NET stack, either as middleware or as a host. Katana is the Microsoft implementation of OWIN. It leverages OWIN to wire up things like authentication, handlers, modules, IIS hosting, etc., so ASP.NET can host OWIN components and Katana components can run in someone else's OWIN implementation. Howard Dierking just wrote a cool article in MSDN magazine describing Katana in depth: Getting Started with the Katana Project. He had an interesting example showing an OWIN based pipeline which leveraged SignalR, ASP.NET Web API and NancyFx components in the same stack. If this kind of thing makes sense to you, that's great. If it doesn't, don't worry, but keep an eye on it. You're going to see some cool things happen as a result of ASP.NET becoming more and more pluggable. Visual Studio Web Tools Okay, this stuff's just crazy. Visual Studio has been adding some nice web dev features over the past few years, but they've really cranked it up for this release. Visual Studio is by far my favorite code editor for all web files: CSS, HTML, JavaScript, and lots of popular libraries. Stop thinking of Visual Studio as a big editor that you only use to write back-end code. Stop editing HTML and CSS in Notepad (or Sublime, Notepad++, etc.). Visual Studio starts up in under 2 seconds on a modern computer with an SSD. Misspelling HTML attributes or your CSS classes or jQuery or Angular syntax is stupid. It doesn't make you a better developer, it makes you a silly person who wastes time. Browser Link Browser Link is a real-time, two-way connection between Visual Studio and all connected browsers. It's only attached when you're running locally, in debug, but it applies to any and all connected browser, including emulators. You may have seen demos that showed the browsers refreshing based on changes in the editor, and I'll agree that's pretty cool. But it's really just the start. It's a two-way connection, and it's built for extensiblity. That means you can write extensions that push information from your running application (in IE, Chrome, a mobile emulator, etc.) back to Visual Studio. Mads and team have showed off some demonstrations where they enabled edit mode in the browser which updated the source HTML back on the browser. It's also possible to look at how the rendered HTML performs, check for compatibility issues, watch for unused CSS classes, the sky's the limit. New HTML editor The previous HTML editor had a lot of old code that didn't allow for improvements. The team rewrote the HTML editor to take advantage of the new(ish) extensibility features in Visual Studio, which then allowed them to add in all kinds of features - things like CSS Class and ID IntelliSense (so you type style="" and get a list of classes and ID's for your project), smart indent based on how your document is formatted, JavaScript reference auto-sync, etc. Here's a 3 minute tour from Mads Kristensen. The previous HTML editor had a lot of old code that didn't allow for improvements. The team rewrote the HTML editor to take advantage of the new(ish) extensibility features in Visual Studio, which then allowed them to add in all kinds of features - things like CSS Class and ID IntelliSense (so you type style="" and get a list of classes and ID's for your project), smart indent based on how your document is formatted, JavaScript reference auto-sync, etc. Lots more Visual Studio web dev features That's just a sampling - there's a ton of great features for JavaScript editing, CSS editing, publishing, and Page Inspector (which shows real-time rendering of your page inside Visual Studio). Here are some more short videos showing those features. Lots, lots more Okay, that's just a summary, and it's still quite a bit. Head on over to http://asp.net/vnext for more information, and download Visual Studio 2013 now to get started!

Read the article
IIS SSL Certificate Renewal Pain

- by Rick Strahl

I’m in the middle of my annual certificate renewal for the West Wind site and I can honestly say that I hate IIS’s certificate system. When it works it’s fine, but when it doesn’t man can it be a pain. Because I deal with public certificates on my site merely once a year, and you have to perform the certificate dance just the right way, I seem to run into some sort of trouble every year, thinking that Microsoft surely must have addressed the issues I ran into previously – HA! Not so. Don’t ever use the Renew Certificate Feature in IIS! The first rule that I should have never forgotten is that certificate renewals in IIS (7 is what I’m using but I think it’s no different in 7.5 and 8), simply don’t work if you’re submitting to get a public certificate from a certificate authority. I use DNSimple for my DNS domain management and SSL certificates because they provide ridiculously easy domain management and good prices for SSL certs – especially wildcard certificates, which is what I use on west-wind.com. Certificates in IIS can be found pegged to the machine root. If you go into the IIS Manager, go to the machine root the tree and then click on certificates and you then get various certificate options: Both of these options create a new Certificate request (CSR), which is just a text file. But if you’re silly enough like me to click on the Renew button on your old certificate, you’ll find that you end up generating a very long Certificate Request that looks nothing like the original certificate request and the format that’s used for this is not accepted by most certificate authorities. While I’m not sure exactly what the problem is, it simply looks like IIS is respecting none of your original certificate bit size choices and is generating a huge certificate request that is 3 times the size of a ‘normal’ certificate request. The end result is (and I’ve done this at least twice now) is that the certificate processor is likely to fail processing those renewals. Always create a new Certificate While it’s a little more work and you have to remember how to fill out the certificate request properly, this is the safe way to make sure your certificate generates properly. First comes the Distinguished Name Properties dialog: Ah yes you have to love the nomenclature of this stuff. Distinguished name, Common name – WTF is a common name? It doesn’t look common to me! Make sure this form gets filled out correctly. Common NameThis is the domain name of the Web site. In my case I’m creating a wildcard certificate so I’m using the * prefix. If you’re purchasing a certificate for a specific domain use www.west-wind.com or store.west-wind.com for example. Make sure this matches the EXACT domain you’re trying to use secure access on because that’s all the certificate is going to work on unless you get a wildcard certificate. Organization Is the name of your company or organization. Depending on the kind of certificate you purchase this name will show up on your certificate. Most low end SSL certificates (ie. those that cost under $100 for single domains) don’t list the organization, the higher signature certificates that also require extensive validation by the cert authority do. Regardless you should make sure this matches the right company/organization. Organizational Unit This can be anything. Not really sure what this is for, but traditionally I’ve always set this to Web because – well this is a Web thing after all right? I’ve never seen this used anywhere that I can tell other than to internally reference the cert. State and CountryPretty obvious. Should reflect the location of the business/organization/person or site. Next you have to configure the bit size used for the certificate: The default on this dialog is 1024, but I’ve found that most providers these days request a minimum bit length of 2048, as did my DNSimple provider. Again check with the provider when you submit to make sure. Bit length mismatches can cause problems if you use a size that isn’t supported by the provider. I had that happen last year when I submitted my CSR and it got rejected quite a bit later, when the certs usually are issued within an hour or less. When you’re done here, the certificate is saved to disk as a .txt file and it should look something like this (this is a 2048 bit length CSR):-----BEGIN NEW CERTIFICATE REQUEST----- MIIEVGCCAz0CAQAwdjELMAkGA1UEBhMCVVMxDzANBgNVBAgMBkhhd2FpaTENMAsG A1UEBwwEUGFpYTEfMB0GA1UECgwWV2VzdCBXaW5kIFRlY2hub2xvZ2llczEMMAoG B1UECwwDV2ViMRgwFgYDVQQDDA8qLndlc3Qtd2luZC5jb20wggEiMA0GCSqGSIb3 DQEBAQUAA4IBDwAwggEKAoIBAQDIPWOFMkMVRp2Ftj9w/cCVV4OYYhoZYtl+8lTk oqDwKca0xWHLgioX/9v0rZLS6a82MHqKEBxVXu+cuCmSE4AQtB/1YH9lS4tpc/be OZDvnTotP6l4MCEzzAfROcw4CiIg6X0RMSnl8IATAvv2V5LQM9TDdt9oDdMpX2IY +vVC9RZ7PMHBmR9kwI2i/lrKitzhQKaHgpmKcRlM6iqpALUiX28w5HJaDKK1MDHN 607tyFJLHijuJKx7PdTqZYf50KkC3NupfZ2avVycf18Q13jHWj59tvwEOczoVzRL l4LQivAqbhyiqMpWnrZunIOUZta5aGm+jo7O1knGWJjxuraTAgMBAAGgggGYMBoG CisGAQQBgjcNAgMxDBYKNi4yLjkyMDAuMjA0BgkrBgEEAYI3FRQxJzAlAgEFDAZS QVNYUFMMC1JBU1hQU1xSaWNrDAtJbmV0TWdyLmV4ZTByBgorBgEEAYI3DQICMWQw YgIBAR5aAE0AaQBjAHIAbwBzAG8AZgB0ACAAUgBTAEEAIABTAEMAaABhAG4AbgBl AGwAIABDAHIAeQBwAHQAbwBnAHIAYQBwAGgAaQBjACAAUAByAG8AdgBpAGQAZQBy AwEAMIHPBgkqhkiG9w0BCQ4xgcEwgb4wDgYDVR0PAQH/BAQDAgTwMBMGA1UdJQQM MAoGCCsGAQUFBwMBMHgGCSqGSIb3DQEJDwRrMGkwDgYIKoZIhvcNAwICAgCAMA4G CCqGSIb3DQMEAgIAgDALBglghkgBZQMEASowCwYJYIZIAWUDBAEtMAsGCWCGSAFl AwQBAjALBglghkgBZQMEAQUwBwYFKw4DAgcwCgYIKoZIhvcNAwcwHQYDVR0OBBYE FD/yOsTbXE+GVFCFMmldzQvyloz9MA0GCSqGSIb3DQEBBQUAA4IBAQCK6LlsCuIM 1AU0niB6QZ9v0FTsGFxP1dYvVUnJyY6VEKNiGFiQjZac7UCs0p58yScdXWEFOE8V OsjAYD3xYNc05+ckyD67UHRGEUAVB9RBvbKW23KeR/8kBmEzc8PemD52YOgExxAJ 57xWmAwEHAvbgYzQvhO8AOzH3TGvvHbg5UKM1pYgNmuwZq5DkL/IDoeIJwfk/wrI wghNTuxxIFgbH4YrgLgv4PRvrS/LaTCRBdboaCgzATMczaOb1nd/DVNR+3fCtMhM W0psTAjzRbmXF3nJyAQa7jF/52gkY0RfFX2lG5tJnG+XDsVNvKNvh9Qa5Tlmkm06 ILKCm9ciWCKk -----END NEW CERTIFICATE REQUEST----- You can take that certificate request and submit that to your certificate provider. Since this is base64 encoded you can typically just paste it into a text box on the submission page, or some providers will ask you to upload the CSR as a file. What does a Renewal look like? Note the length of the CSR will vary somewhat with key strength, but compare this to a renewal request that IIS generated from my existing site:-----BEGIN NEW CERTIFICATE REQUEST----- MIIPpwYFKoZIhvcNAQcCoIIPmDCCD5QCAQExCzAJBgUrDgMCGgUAMIIIqAYJKoZI hvcNAQcBoIIImQSCCJUwggiRMIIH+gIBADBdMSEwHwYDVQQLDBhEb21haW4gQ29u dHJvbCBWYWxpFGF0ZWQxHjAcBgNVBAsMFUVzc2VudGlhbFNTTCBXaWxkY2FyZDEY MBYGA1UEAwwPKi53ZXN0LXdpbmQuY29tMIGfMA0GCSqGSIb3DQEBAQUAA4GNADCB iQKBgQCK4OuIOR18Wb8tNMGRZiD1c9X57b332Lj7DhbckFqLs0ys8kVDHrTXSj+T Ye9nmAvfPpZmBtE5p9qRNN79rUYugAdl+qEtE4IJe1bRfxXzcKa1SXa8+TEs3zQa zYSmcR2dDuC8om1eAdeCtt0NnkvANgm1VLwGOor/UHMASaEhCQIDAQABoIIG8jAa BgorBgEEAYI3DQIDMQwWCjYuMi45MjAwLjIwNAYJKwYBBAGCNxUUMScwJQIBBQwG UkFTWFBTDAtSQVNYUFNcUmljawwLSW5ldE1nci5leGUwZgYKKwYBBAGCNw0CAjFY MFYCAQIeTgBNAGkAYwByAG8AcwBvAGYAdAAgAFMAdAByAG8AbgBnACAAQwByAHkA cAB0AG8AZwByAGEAcABoAGkAYwAgAFAAcgBvAHYAaQBkAGUAcgMBADCCAQAGCSqG SIb3DQEJDjGB8jCB7zAOBgNVHQ8BAf8EBAMCBaAwDAYDVR0TAQH/BAIwADA0BgNV HSUELTArBggrBgEFBQcDAQYIKwYBBQUHAwIGCisGAQQBgjcKAwMGCWCGSAGG+EIE ATBPBgNVHSAESDBGMDoGCysGAQQBsjEBAgIHMCswKQYIKwYBBQUHAgEWHWh0dHBz Oi8vc2VjdXJlLmNvbW9kby5jb20vQ1BTMAgGBmeBDAECATApBgNVHREEIjAggg8q Lndlc3Qtd2luZC5jb22CDXdlc3Qtd2luZC5jb20wHQYDVR0OBBYEFEVLAyO8gDiv lsfovKrx9mHPyrsiMIIFMAYJKwYBBAGCNw0BMYIFITCCBR0wggQFoAMCAQICEQDu 1E1T5Jvtkm5LOfSHabWlMA0GCSqGSIb3DQEBBQUAMHIxCzAJBgNVBAYTAkdCMRsw GQYDVQQIExJHcmVhdGVyIE1hbmNoZXN0ZXIxEDAOBgNVBAcTB1NhbGZvcmQxGjAY BgNVBAoTEUNPTU9ETyBDQSBMaW1pdGVkMRgwFgYDVQQDEw9Fc3NlbnRpYWxTU0wg Q0EwHhcNMTQwNTA3MDAwMDAwWhcNMTUwNjA2MjM1OTU5WjBdMSEwHwYDVQQLExhE b21haW4gQ29udHJvbCBWYWxpZGF0ZWQxHjAcBgNVBAsTFUVzc2VudGlhbFNTTCBX aWxkY2FyZDEYMBYGA1UEAxQPKi53ZXN0LXdpbmQuY29tMIIBIjANBgkqhkiG9w0B AQEFAAOCAQ8AMIIBCgKCAQEAiyKfL66XB51DlUfm6xXqJBcvMU2qorRHxC+WjEpB amvg8XoqNfCKzDAvLMbY4BLhbYCTagqtslnP3Gj4AKhXqRKU0n6iSbmS1gcWzCJM CHufZ5RDtuTuxhTdJxzP9YqZUfKV5abWQp/TK6V1ryaBJvdqM73q4tRjrQODtkiR PfZjxpybnBHFJS8jYAf8jcOjSDZcgN1d9Evc5MrEJCp/90cAkozyF/NMcFtD6Yj8 UM97z3MzDT2JPDoH3kAr3cCgpUNyQ2+wDNCnL9eWYFkOQi8FZMsZol7KlZ5NgNfO a7iZMVGbqDg6rkS//2uGe6tSQJTTs+mAZB+na+M8XT2UqwIDAQABo4IBwTCCAb0w HwYDVR0jBBgwFoAU2svqrVsIXcz//CZUzknlVcY49PgwHQYDVR0OBBYEFH0AmLiL RSEL9+sQD/n5O4N7/nnqMA4GA1UdDwEB/wQEAwIFoDAMBgNVHRMBAf8EAjAAMDQG A1UdJQQtMCsGCCsGAQUFBwMBBggrBgEFBQcDAgYKKwYBBAGCNwoDAwYJYIZIAYb4 QgQBME8GA1UdIARIMEYwOgYLKwYBBAGyMQECAgcwKzApBggrBgEFBQcCARYdaHR0 cHM6Ly9zZWN1cmUuY29tb2RvLmNvbS9DUFMwCAYGZ4EMAQIBMDsGA1UdHwQ0MDIw MKAuoCyGKmh0dHA6Ly9jcmwuY29tb2RvY2EuY29tL0Vzc2VudGlhbFNTTENBLmNy bDBuBggrBgEFBQcBAQRiMGAwOAYIKwYBBQUHMAKGLGh0dHA6Ly9jcnQuY29tb2Rv Y2EuY29tL0Vzc2VudGlhbFNTTENBXzIuY3J0MCQGCCsGAQUFBzABhhhodHRwOi8v b2NzcC5jb21vZG9jYS5jb20wKQYDVR0RBCIwIIIPKi53ZXN0LXdpbmQuY29tgg13 ZXN0LXdpbmQuY29tMA0GCSqGSIb3DQEBBQUAA4IBAQBqBfd6QHrxXsfgfKARG6np 8yszIPhHGPPmaE7xq7RpcZjY9H+8l6fe4jQbGFjbA5uHBklYI4m2snhPaW2p8iF8 YOkm2V2hEsSTnkf5/flw9mZtlCFEDFXSsBxBdNz8RYTthPMu1h09C0XuDB30sztg nR692FrxJN5/bXsk+MC9nEweTFW/t2HW+XZ8bhM7vsAS+pZionR4MyuQ0mYIt/lD csZVZ91KxTsIm8rNMkkYGFoSIXjQ0+0tCbxMF0i2qnpmNRpA6PU8l7lxxvPkplsk 9KB8QIPFrR5p/i/SUAd9vECWh5+/ktlcrfFP2PK7XcEwWizsvMrNqLyvQVNXSUPT MA0GCSqGSIb3DQEBBQUAA4GBABt/NitwMzc5t22p5+zy4HXbVYzLEjesLH8/v0ot uLQ3kkG8tIWNh5RplxIxtilXt09H4Oxpo3fKUN0yw+E6WsBfg0sAF8pHNBdOJi48 azrQbt4HvKktQkGpgYFjLsormjF44SRtToLHlYycDHBNvjaBClUwMCq8HnwY6vDq xikRoIIFITCCBR0wggQFoAMCAQICEQDu1E1T5Jvtkm5LOfSHabWlMA0GCSqGSIb3 DQEBBQUAMHIxCzAJBgNVBAYTAkdCMRswGQYDVQQIExJHcmVhdGVyIE1hbmNoZXN0 ZXIxEDAOBgNVBAcTB1NhbGZvcmQxGjAYBgNVBAoTEUNPTU9ETyBDQSBMaW1pdGVk MRgwFgYDVQQDEw9Fc3NlbnRpYWxTU0wgQ0EwHhcNMTQwNTA3MDAwMDAwWhcNMTUw NjA2MjM1OTU5WjBdMSEwHwYDVQQLExhEb21haW4gQ29udHJvbCBWYWxpZGF0ZWQx HjAcBgNVBAsTFUVzc2VudGlhbFNTTCBXaWxkY2FyZDEYMBYGA1UEAxQPKi53ZXN0 LXdpbmQuY29tMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAiyKfL66X B51DlUfm6xXqJBcvMU2qorRHxC+WjEpBamvg8XoqNfCKzDAvLMbY4BLhbYCTagqt slnP3Gj4AKhXqRKU0n6iSbmS1gcWzCJMCHufZ5RDtuTuxhTdJxzP9YqZUfKV5abW Qp/TK6V1ryaBJvdqM73q4tRjrQODtkiRPfZjxpybnBHFJS8jYAf8jcOjSDZcgN1d 9Evc5MrEJCp/90cAkozyF/NMcFtD6Yj8UM97z3MzDT2JPDoH3kAr3cCgpUNyQ2+w DNCnL9eWYFkOQi8FZMsZol7KlZ5NgNfOa7iZMVGbqDg6rkS//2uGe6tSQJTTs+mA ZB+na+M8XT2UqwIDAQABo4IBwTCCAb0wHwYDVR0jBBgwFoAU2svqrVsIXcz//CZU zknlVcY49PgwHQYDVR0OBBYEFH0AmLiLRSEL9+sQD/n5O4N7/nnqMA4GA1UdDwEB /wQEAwIFoDAMBgNVHRMBAf8EAjAAMDQGA1UdJQQtMCsGCCsGAQUFBwMBBggrBgEF BQcDAgYKKwYBBAGCNwoDAwYJYIZIAYb4QgQBME8GA1UdIARIMEYwOgYLKwYBBAGy MQECAgcwKzApBggrBgEFBQcCARYdaHR0cHM6Ly9zZWN1cmUuY29tb2RvLmNvbS9D UFMwCAYGZ4EMAQIBMDsGA1UdHwQ0MDIwMKAuoCyGKmh0dHA6Ly9jcmwuY29tb2Rv Y2EuY29tL0Vzc2VudGlhbFNTTENBLmNybDBuBggrBgEFBQcBAQRiMGAwOAYIKwYB BQUHMAKGLGh0dHA6Ly9jcnQuY29tb2RvY2EuY29tL0Vzc2VudGlhbFNTTENBXzIu Y3J0MCQGCCsGAQUFBzABhhhodHRwOi8vb2NzcC5jb21vZG9jYS5jb20wKQYDVR0R BCIwIIIPKi53ZXN0LXdpbmQuY29tgg13ZXN0LXdpbmQuY29tMA0GCSqGSIb3DQEB BQUAA4IBAQBqBfd6QHrxXsfgfKARG6np8yszIPhHGPPmaE7xq7RpcZjY9H+8l6fe 4jQbGFjbA5uHBklYI4m2snhPaW2p8iF8YOkm2V2hEsSTnkf5/flw9mZtlCFEDFXS sBxBdNz8RYTthPMu1h09C0XuDB30sztgnR692FrxJN5/bXsk+MC9nEweTFW/t2HW +XZ8bhM7vsAS+pZionR4MyuQ0mYIt/lDcsZVZ91KxTsIm8rNMkkYGFoSIXjQ0+0t CbxMF0i2qnpmNRpA6PU8l7lxxvPkplsk9KB8QIPFrR5p/i/SUAd9vECWh5+/ktlc rfFP2PK7XcEwWizsvMrNqLyvQVNXSUPTMYIBrzCCAasCAQEwgYcwcjELMAkGA1UE BhMCR0IxGzAZBgNVBAgTEkdyZWF0ZXIgTWFuY2hlc3RlcjEQMA4GA1UEBxMHU2Fs Zm9yZDEaMBgGA1UEChMRQ09NT0RPIENBIExpbWl0ZWQxGDAWBgNVBAMTD0Vzc2Vu dGlhbFNTTCBDQQIRAO7UTVPkm+2Sbks59IdptaUwCQYFKw4DAhoFADANBgkqhkiG 9w0BAQEFAASCAQB8PNQ6bYnQpWfkHyxnDuvNKw3wrqF2p7JMZm+SuN2qp3R2LpCR mW2LrGtQIm9Iob/QOYH+8houYNVdvsATGPXX2T8gzn+anof4tOG0vCTK1Bp9bwf9 MkRP+1c8RW/vkYmUW4X5/C+y3CZpMH5dDTaXBIpXFzjX/fxNpH/rvLzGiaYYL3Cn OLO+aOADr9qq5yoqwpiYCSfYNNYKTUNNGfYIidQwYtbHXEYhSukB2oR89xD2sZZ4 bOqFjUPgTa5SsERLDDeg3omMKiIXVYGxlqBEq51Kge6IQt4qQV9P9VgInW7cWmKe dTqNHI9ri3ttewdEnT++TKGKKfTjX9SR8Waj -----END NEW CERTIFICATE REQUEST----- Clearly there’s something very different between this an my original request! And it didn’t work. IIS creates a custom CSR that is encoded in a format that no certificate authority I’ve ever used uses. If you want the gory details of what’s in there look at this ServerFault question (thanks to Mika in the comments). In the end it doesn’t matter though – no certificate authority knows what to do with this CSR. So create a new CSR and skip the renewal. Always! Use the same Server Keep in mind that on IIS at least you should always create your certificate on a single server and then when you receive the final certificate from your provider import it on that server. IIS tracks the CSR it created and requires it in order to import the final certificate properly. So if for some reason you try to install the certificate on another server, it won’t work. I’ve also run into trouble trying to install the same certificate twice – this time around I didn’t give my certificate the proper friendly name and IIS failed to allow me to assign the certificate to any of my Web sites. So I removed the certificate and tried to import again, only to find it failed the second time around. There are other ways to fix this, but in my case I had to have the certificate re-issued to work – not what you want to do. Regardless of what you do though, when you import make sure you do it right the first time by crossing all your t’s and dotting your i's– it’ll save you a lot of grief! You don’t actually have to use the server that the certificate gets installed on to generate the CSR and first install it, but it is generally a good idea to do so just so you can get the certificate installed into the right place right away. If you have access to the server where you need to install the certificate you might as well use it. But you can use another machine to generated the and install the certificate, then export the certificate and move it to another machine as needed. So you can use your Dev machine to create a certificate then export it and install it on a live server. More on installation and back up/export later. Installing the Certificate Once you’ve submitted a CSR request your provider will process the request and eventually issue you a new final certificate that contains another text file with the final key to import into your certificate store. IIS does this by combining the content in your certificate request with the original CSR. If all goes well your new certificate shows up in the certificate list and you’re ready to assign the certificate to your sites. Make sure you use a friendly name that matches domain name of your site. So use *.mysite.com or www.mysite.com or store.mysite.com to ensure IIS recognizes the certificate. I made the mistake of not naming my friendly name this way and found that IIS was unable to link my sites to my wildcard certificate. It needed to have the *. as part of the certificate otherwise the Hostname input field was blanked out. Changing the Friendly Name If you by accidentally used an invalid friendly name you can change it later in the Windows certificate store. Bring up a Run Box Type MMC File | Add/Remove Snap In Add Certificates | Computer Account | Local Computer Drill into Certificates | Personal | Certificates Find your Certificate | Right Click | Properties Edit the Friendly Name | Click OK Backing up your Certificate The first thing you should do once your certificate is successfully installed is to back it up! In case your server crashes or you otherwise lose your configuration this will ensure you have an easy way to recover and reinstall your certificate either on the same server or a different one. If you’re running a server farm or using a wildcard certificate you also need to get the certificate onto other machines and a PFX file import is the easiest way to do this. To back up your certificate select your certificate and choose Export from the context or sidebar menu: The Export Certificate option allows you to export a password protected binary file that you can import in a single step. You can copy the resulting binary PFX file to back up or copy to other machines to install on. Importing the certificate on another machine is as easy as pointing at the PFX file and specifying the password. IIS handles the rest. Assigning a new certificate to your Site Once you have the new certificate installed, all that’s left to do is assign it to your site. In IIS select your Web site and bring up the Site Bindings from the right sidebar. Add a new binding for https, bind it to port 443, specify your hostname and pick the certificate from the pick list. If you’re using a root site make sure to set up your certificate for www.yoursite.com and also for yoursite.com so that both work properly with SSL. Note that you need to explicitly configure each hostname for a certificate if you plan to use SSL. Luckily if you update your SSL certificate in the following year, IIS prompts you and asks whether you like to update all other sites that are using the existing cert to the newer cert. And you’re done. So what’s the Pain? So, all of this is old hat and it doesn’t look all that bad right? So what’s the pain here? Well if you follow the instructions and do everything right, then the process is about as straight forward as you would expect it to be. You create a cert request, you import it and assign it to your sites. That’s the basic steps and to be perfectly fair it works well – if nothing goes wrong. However, renewing tends to be the problem. The first unintuitive issue is that you simply shouldn’t renew but create a new CSR and generate your new certificate from that. Over the years I’ve fallen prey to the belief that Microsoft eventually will fix this so that the renewal creates the same type of CSR as the old cert, but apparently that will just never happen. Booo! The other problem I ran into is that I accidentally misnamed my imported certificate which in turn set off a chain of events that caused my originally issued certificate to become uninstallable. When I received my completed certificate I installed it and it installed just fine, but the friendly name was wrong. As a result IIS refused to assign the certificate to any of my host headered sites. That’s strike number one. Why the heck should the friendly name have any effect on the ability to attach the certificate??? Next I uninstalled the certificate because I figured that would be the easiest way to make sure I get it right. But I found that I could not reinstall my certificate. I kept getting these stop errors: "ASN1 bad tag value met" that would prevent the installation from completion. After searching around for this error and reading countless long messages on forums, I found that this error supposedly does not actually mean the install failed, but the list wouldn’t refresh. Commodo has this to say: Note: There is a known issue in IIS 7 giving the following error: "Cannot find the certificate request associated with this certificate file. A certificate request must be completed on the computer where it was created." You may also receive a message stating "ASN1 bad tag value met". If this is the same server that you generated the CSR on then, in most cases, the certificate is actually installed. Simply cancel the dialog and press "F5" to refresh the list of server certificates. If the new certificate is now in the list, you can continue with the next step. If it is not in the list, you will need to reissue your certificate using a new CSR (see our CSR creation instructions for IIS 7). After creating a new CSR, login to your Comodo account and click the 'replace' button for your certificate. Not sure if this issue is fixed in IIS 8 but that’s an insane bug to have crop up. As it turns out, in my case the refresh didn’t work and the certificate didn’t show up in the IIS list after the reinstall. In fact when looking at the certificate store I could see my certificate was installed in the right place, but the private key is missing which is most likely why IIS is not picking it up. It looks like IIS could not match the final cert to the original CSR generated. But again some sort of message to that affect might be helpful instead of ASN1 bad tag value met. Recovering the Private Key So it turns out my original problem was that I received the published key, but when I imported the private key was missing. There’s a relatively easy way to recover from this. If your certificate doesn’t show up in IIS check in the certificate store for the local machine (see steps above on how to bring this up). If you look at the certificate in Certificates/Personal/Certificates make sure you see the key as shown in the image below: if the key is missing it means that the certificate is missing the private key most likely. To fix a certificate you can do the following: Double click the certificate Go to the Details Tab Copy down the Serial number You can copy the serial number from the area blurred out above. The serial number will be in a format like ?00 a7 9b a1 a4 9d 91 63 57 d6 9f 26 b8 ee 79 b5 cb and you’ll need to strip out the spaces in order to use it in the next step. Next open up an Administrative command prompt and issue the following command: certutil -repairstore my 00a79ba1a49d916357d69f26b8ee79b5cb You should get a confirmation message that the repair worked. If you now go back to the certificate store you should now see the key icon show up on the certificate. Your certificate is fixed. Now go back into IIS Manager and refresh the list of certificates and if all goes well you should see all the certificates that showed in the cert store now: Remember – back up the key first then map to your site… Summary I deal with a lot of customers who run their own IIS servers, and I can’t tell you how often I hear about botched SSL installations. When I posted some of my issues on Twitter yesterday I got a hell storm of “me too” responses. I’m clearly not the only one, who’s run into this especially with renewals. I feel pretty comfortable with IIS configuration and I do a lot of it for support purposes, but the SSL configuration is one that never seems to go seamlessly. This blog post is meant as reminder to myself to read next time I do a renewal. So I can dot my i's and dash my t’s before I get caught in the mess I’m dealing with today. Hopefully some of you find this useful as well.© Rick Strahl, West Wind Technologies, 2005-2014Posted in IIS7 Security Tweet !function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0];if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src="//platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs"); (function() { var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = 'https://apis.google.com/js/plusone.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })();

Read the article
Guidance: A Branching strategy for Scrum Teams

- by Martin Hinshelwood

Having a good branching strategy will save your bacon, or at least your code. Be careful when deviating from your branching strategy because if you do, you may be worse off than when you started! This is one possible branching strategy for Scrum teams and I will not be going in depth with Scrum but you can find out more about Scrum by reading the Scrum Guide and you can even assess your Scrum knowledge by having a go at the Scrum Open Assessment. You can also read SSW’s Rules to Better Scrum using TFS which have been developed during our own Scrum implementations. Acknowledgements Bill Heys – Bill offered some good feedback on this post and helped soften the language. Note: Bill is a VS ALM Ranger and co-wrote the Branching Guidance for TFS 2010 Willy-Peter Schaub – Willy-Peter is an ex Visual Studio ALM MVP turned blue badge and has been involved in most of the guidance including the Branching Guidance for TFS 2010 Chris Birmele – Chris wrote some of the early TFS Branching and Merging Guidance. Dr Paul Neumeyer, Ph.D Parallel Processes, ScrumMaster and SSW Solution Architect – Paul wanted to have feature branches coming from the release branch as well. We agreed that this is really a spin-off that needs own project, backlog, budget and Team. Scenario: A product is developed RTM 1.0 is released and gets great sales. Extra features are demanded but the new version will have double to price to pay to recover costs, work is approved by the guys with budget and a few sprints later RTM 2.0 is released. Sales a very low due to the pricing strategy. There are lots of clients on RTM 1.0 calling out for patches. As I keep getting Reverse Integration and Forward Integration mixed up and Bill keeps slapping my wrists I thought I should have a reminder: You still seemed to use reverse and/or forward integration in the wrong context. I would recommend reviewing your document at the end to ensure that it agrees with the common understanding of these terms merge (forward integration) from parent to child (same direction as the branch), and merge (reverse integration) from child to parent (the reverse direction of the branch). - one of my many slaps on the wrist from Bill Heys. As I mentioned previously we are using a single feature branching strategy in our current project. The single biggest mistake developers make is developing against the “Main” or “Trunk” line. This ultimately leads to messy code as things are added and never finished. Your only alternative is to NEVER check in unless your code is 100%, but this does not work in practice, even with a single developer. Your ADD will kick in and your half-finished code will be finished enough to pass the build and the tests. You do use builds don’t you? Sadly, this is a very common scenario and I have had people argue that branching merely adds complexity. Then again I have seen the other side of the universe ... branching structures from he... We should somehow convince everyone that there is a happy between no-branching and too-much-branching. - Willy-Peter Schaub, VS ALM Ranger, Microsoft A key benefit of branching for development is to isolate changes from the stable Main branch. Branching adds sanity more than it adds complexity. We do try to stress in our guidance that it is important to justify a branch, by doing a cost benefit analysis. The primary cost is the effort to do merges and resolve conflicts. A key benefit is that you have a stable code base in Main and accept changes into Main only after they pass quality gates, etc. - Bill Heys, VS ALM Ranger & TFS Branching Lead, Microsoft The second biggest mistake developers make is branching anything other than the WHOLE “Main” line. If you branch parts of your code and not others it gets out of sync and can make integration a nightmare. You should have your Source, Assets, Build scripts deployment scripts and dependencies inside the “Main” folder and branch the whole thing. Some departments within MSFT even go as far as to add the environments used to develop the product in there as well; although I would not recommend that unless you have a massive SQL cluster to house your source code. We tried the “add environment” back in South-Africa and while it was “phenomenal”, especially when having to switch between environments, the disk storage and processing requirements killed us. We opted for virtualization to skin this cat of keeping a ready-to-go environment handy. - Willy-Peter Schaub, VS ALM Ranger, Microsoft I think people often think that you should have separate branches for separate environments (e.g. Dev, Test, Integration Test, QA, etc.). I prefer to think of deploying to environments (such as from Main to QA) rather than branching for QA). - Bill Heys, VS ALM Ranger & TFS Branching Lead, Microsoft You can read about SSW’s Rules to better Source Control for some additional information on what Source Control to use and how to use it. There are also a number of branching Anti-Patterns that should be avoided at all costs: You know you are on the wrong track if you experience one or more of the following symptoms in your development environment: Merge Paranoia—avoiding merging at all cost, usually because of a fear of the consequences. Merge Mania—spending too much time merging software assets instead of developing them. Big Bang Merge—deferring branch merging to the end of the development effort and attempting to merge all branches simultaneously. Never-Ending Merge—continuous merging activity because there is always more to merge. Wrong-Way Merge—merging a software asset version with an earlier version. Branch Mania—creating many branches for no apparent reason. Cascading Branches—branching but never merging back to the main line. Mysterious Branches—branching for no apparent reason. Temporary Branches—branching for changing reasons, so the branch becomes a permanent temporary workspace. Volatile Branches—branching with unstable software assets shared by other branches or merged into another branch. Note Branches are volatile most of the time while they exist as independent branches. That is the point of having them. The difference is that you should not share or merge branches while they are in an unstable state. Development Freeze—stopping all development activities while branching, merging, and building new base lines. Berlin Wall—using branches to divide the development team members, instead of dividing the work they are performing. -Branching and Merging Primer by Chris Birmele - Developer Tools Technical Specialist at Microsoft Pty Ltd in Australia In fact, this can result in a merge exercise no-one wants to be involved in, merging hundreds of thousands of change sets and trying to get a consolidated build. Again, we need to find a happy medium. - Willy-Peter Schaub on Merge Paranoia Merge conflicts are generally the result of making changes to the same file in both the target and source branch. If you create merge conflicts, you will eventually need to resolve them. Often the resolution is manual. Merging more frequently allows you to resolve these conflicts close to when they happen, making the resolution clearer. Waiting weeks or months to resolve them, the Big Bang approach, means you are more likely to resolve conflicts incorrectly. - Bill Heys, VS ALM Ranger & TFS Branching Lead, Microsoft Figure: Main line, this is where your stable code lives and where any build has known entities, always passes and has a happy test that passes as well? Many development projects consist of, a single “Main” line of source and artifacts. This is good; at least there is source control . There are however a couple of issues that need to be considered. What happens if: you and your team are working on a new set of features and the customer wants a change to his current version? you are working on two features and the customer decides to abandon one of them? you have two teams working on different feature sets and their changes start interfering with each other? I just use labels instead of branches? That's a lot of “what if’s”, but there is a simple way of preventing this. Branching… In TFS, labels are not immutable. This does not mean they are not useful. But labels do not provide a very good development isolation mechanism. Branching allows separate code sets to evolve separately (e.g. Current with hotfixes, and vNext with new development). I don’t see how labels work here. - Bill Heys, VS ALM Ranger & TFS Branching Lead, Microsoft Figure: Creating a single feature branch means you can isolate the development work on that branch. Its standard practice for large projects with lots of developers to use Feature branching and you can check the Branching Guidance for the latest recommendations from the Visual Studio ALM Rangers for other methods. In the diagram above you can see my recommendation for branching when using Scrum development with TFS 2010. It consists of a single Sprint branch to contain all the changes for the current sprint. The main branch has the permissions changes so contributors to the project can only Branch and Merge with “Main”. This will prevent accidental check-ins or checkouts of the “Main” line that would contaminate the code. The developers continue to develop on sprint one until the completion of the sprint. Note: In the real world, starting a new Greenfield project, this process starts at Sprint 2 as at the start of Sprint 1 you would have artifacts in version control and no need for isolation. Figure: Once the sprint is complete the Sprint 1 code can then be merged back into the Main line. There are always good practices to follow, and one is to always do a Forward Integration from Main into Sprint 1 before you do a Reverse Integration from Sprint 1 back into Main. In this case it may seem superfluous, but this builds good muscle memory into your developer’s work ethic and means that no bad habits are learned that would interfere with additional Scrum Teams being added to the Product. The process of completing your sprint development: The Team completes their work according to their definition of done. Merge from “Main” into “Sprint1” (Forward Integration) Stabilize your code with any changes coming from other Scrum Teams working on the same product. If you have one Scrum Team this should be quick, but there may have been bug fixes in the Release branches. (we will talk about release branches later) Merge from “Sprint1” into “Main” to commit your changes. (Reverse Integration) Check-in Delete the Sprint1 branch Note: The Sprint 1 branch is no longer required as its useful life has been concluded. Check-in Done But you are not yet done with the Sprint. The goal in Scrum is to have a “potentially shippable product” at the end of every Sprint, and we do not have that yet, we only have finished code. Figure: With Sprint 1 merged you can create a Release branch and run your final packaging and testing In 99% of all projects I have been involved in or watched, a “shippable product” only happens towards the end of the overall lifecycle, especially when sprints are short. The in-between releases are great demonstration releases, but not shippable. Perhaps it comes from my 80’s brain washing that we only ship when we reach the agreed quality and business feature bar. - Willy-Peter Schaub, VS ALM Ranger, Microsoft Although you should have been testing and packaging your code all the way through your Sprint 1 development, preferably using an automated process, you still need to test and package with stable unchanging code. This is where you do what at SSW we call a “Test Please”. This is first an internal test of the product to make sure it meets the needs of the customer and you generally use a resource external to your Team. Then a “Test Please” is conducted with the Product Owner to make sure he is happy with the output. You can read about how to conduct a Test Please on our Rules to Successful Projects: Do you conduct an internal "test please" prior to releasing a version to a client? Figure: If you find a deviation from the expected result you fix it on the Release branch. If during your final testing or your “Test Please” you find there are issues or bugs then you should fix them on the release branch. If you can’t fix them within the time box of your Sprint, then you will need to create a Bug and put it onto the backlog for prioritization by the Product owner. Make sure you leave plenty of time between your merge from the development branch to find and fix any problems that are uncovered. This process is commonly called Stabilization and should always be conducted once you have completed all of your User Stories and integrated all of your branches. Even once you have stabilized and released, you should not delete the release branch as you would with the Sprint branch. It has a usefulness for servicing that may extend well beyond the limited life you expect of it. Note: Don't get forced by the business into adding features into a Release branch instead that indicates the unspoken requirement is that they are asking for a product spin-off. In this case you can create a new Team Project and branch from the required Release branch to create a new Main branch for that product. And you create a whole new backlog to work from. Figure: When the Team decides it is happy with the product you can create a RTM branch. Once you have fixed all the bugs you can, and added any you can’t to the Product Backlog, and you Team is happy with the result you can create a Release. This would consist of doing the final Build and Packaging it up ready for your Sprint Review meeting. You would then create a read-only branch that represents the code you “shipped”. This is really an Audit trail branch that is optional, but is good practice. You could use a Label, but Labels are not Auditable and if a dispute was raised by the customer you can produce a verifiable version of the source code for an independent party to check. Rare I know, but you do not want to be at the wrong end of a legal battle. Like the Release branch the RTM branch should never be deleted, or only deleted according to your companies legal policy, which in the UK is usually 7 years. Figure: If you have made any changes in the Release you will need to merge back up to Main in order to finalise the changes. Nothing is really ever done until it is in Main. The same rules apply when merging any fixes in the Release branch back into Main and you should do a reverse merge before a forward merge, again for the muscle memory more than necessity at this stage. Your Sprint is now nearly complete, and you can have a Sprint Review meeting knowing that you have made every effort and taken every precaution to protect your customer’s investment. Note: In order to really achieve protection for both you and your client you would add Automated Builds, Automated Tests, Automated Acceptance tests, Acceptance test tracking, Unit Tests, Load tests, Web test and all the other good engineering practices that help produce reliable software. Figure: After the Sprint Planning meeting the process begins again. Where the Sprint Review and Retrospective meetings mark the end of the Sprint, the Sprint Planning meeting marks the beginning. After you have completed your Sprint Planning and you know what you are trying to achieve in Sprint 2 you can create your new Branch to develop in. How do we handle a bug(s) in production that can’t wait? Although in Scrum the only work done should be on the backlog there should be a little buffer added to the Sprint Planning for contingencies. One of these contingencies is a bug in the current release that can’t wait for the Sprint to finish. But how do you handle that? Willy-Peter Schaub asked an excellent question on the release activities: In reality Sprint 2 starts when sprint 1 ends + weekend. Should we not cater for a possible parallelism between Sprint 2 and the release activities of sprint 1? It would introduce FI’s from main to sprint 2, I guess. Your “Figure: Merging print 2 back into Main.” covers, what I tend to believe to be reality in most cases. - Willy-Peter Schaub, VS ALM Ranger, Microsoft I agree, and if you have a single Scrum team then your resources are limited. The Scrum Team is responsible for packaging and release, so at least one run at stabilization, package and release should be included in the Sprint time box. If more are needed on the current production release during the Sprint 2 time box then resource needs to be pulled from Sprint 2. The Product Owner and the Team have four choices (in order of disruption/cost): Backlog: Add the bug to the backlog and fix it in the next Sprint Buffer Time: Use any buffer time included in the current Sprint to fix the bug quickly Make time: Remove a Story from the current Sprint that is of equal value to the time lost fixing the bug(s) and releasing. Note: The Team must agree that it can still meet the Sprint Goal. Cancel Sprint: Cancel the sprint and concentrate all resource on fixing the bug(s) Note: This can be a very costly if the current sprint has already had a lot of work completed as it will be lost. The choice will depend on the complexity and severity of the bug(s) and both the Product Owner and the Team need to agree. In this case we will go with option #2 or #3 as they are uncomplicated but severe bugs. Figure: Real world issue where a bug needs fixed in the current release. If the bug(s) is urgent enough then then your only option is to fix it in place. You can edit the release branch to find and fix the bug, hopefully creating a test so it can’t happen again. Follow the prior process and conduct an internal and customer “Test Please” before releasing. You can read about how to conduct a Test Please on our Rules to Successful Projects: Do you conduct an internal "test please" prior to releasing a version to a client? Figure: After you have fixed the bug you need to ship again. You then need to again create an RTM branch to hold the version of the code you released in escrow. Figure: Main is now out of sync with your Release. We now need to get these new changes back up into the Main branch. Do a reverse and then forward merge again to get the new code into Main. But what about the branch, are developers not working on Sprint 2? Does Sprint 2 now have changes that are not in Main and Main now have changes that are not in Sprint 2? Well, yes… and this is part of the hit you take doing branching. But would this scenario even have been possible without branching? Figure: Getting the changes in Main into Sprint 2 is very important. The Team now needs to do a Forward Integration merge into their Sprint and resolve any conflicts that occur. Maybe the bug has already been fixed in Sprint 2, maybe the bug no longer exists! This needs to be identified and resolved by the developers before they continue to get further out of Sync with Main. Note: Avoid the “Big bang merge” at all costs. Figure: Merging Sprint 2 back into Main, the Forward Integration, and R0 terminates. Sprint 2 now merges (Reverse Integration) back into Main following the procedures we have already established. Figure: The logical conclusion. This then allows the creation of the next release. By now you should be getting the big picture and hopefully you learned something useful from this post. I know I have enjoyed writing it as I find these exploratory posts coupled with real world experience really help harden my understanding. Branching is a tool; it is not a silver bullet. Don’t over use it, and avoid “Anti-Patterns” where possible. Although the diagram above looks complicated I hope showing you how it is formed simplifies it as much as possible. Technorati Tags: Branching,Scrum,VS ALM,TFS 2010,VS2010

Read the article
Improving Partitioned Table Join Performance

- by Paul White

The query optimizer does not always choose an optimal strategy when joining partitioned tables. This post looks at an example, showing how a manual rewrite of the query can almost double performance, while reducing the memory grant to almost nothing. Test Data The two tables in this example use a common partitioning partition scheme. The partition function uses 41 equal-size partitions: CREATE PARTITION FUNCTION PFT (integer) AS RANGE RIGHT FOR VALUES ( 125000, 250000, 375000, 500000, 625000, 750000, 875000, 1000000, 1125000, 1250000, 1375000, 1500000, 1625000, 1750000, 1875000, 2000000, 2125000, 2250000, 2375000, 2500000, 2625000, 2750000, 2875000, 3000000, 3125000, 3250000, 3375000, 3500000, 3625000, 3750000, 3875000, 4000000, 4125000, 4250000, 4375000, 4500000, 4625000, 4750000, 4875000, 5000000 ); GO CREATE PARTITION SCHEME PST AS PARTITION PFT ALL TO ([PRIMARY]); There two tables are: CREATE TABLE dbo.T1 ( TID integer NOT NULL IDENTITY(0,1), Column1 integer NOT NULL, Padding binary(100) NOT NULL DEFAULT 0x, CONSTRAINT PK_T1 PRIMARY KEY CLUSTERED (TID) ON PST (TID) ); CREATE TABLE dbo.T2 ( TID integer NOT NULL, Column1 integer NOT NULL, Padding binary(100) NOT NULL DEFAULT 0x, CONSTRAINT PK_T2 PRIMARY KEY CLUSTERED (TID, Column1) ON PST (TID) ); The next script loads 5 million rows into T1 with a pseudo-random value between 1 and 5 for Column1. The table is partitioned on the IDENTITY column TID: INSERT dbo.T1 WITH (TABLOCKX) (Column1) SELECT (ABS(CHECKSUM(NEWID())) % 5) + 1 FROM dbo.Numbers AS N WHERE n BETWEEN 1 AND 5000000; In case you don’t already have an auxiliary table of numbers lying around, here’s a script to create one with 10 million rows: CREATE TABLE dbo.Numbers (n bigint PRIMARY KEY); WITH L0 AS(SELECT 1 AS c UNION ALL SELECT 1), L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B), L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B), L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B), L4 AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B), L5 AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B), Nums AS(SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n FROM L5) INSERT dbo.Numbers WITH (TABLOCKX) SELECT TOP (10000000) n FROM Nums ORDER BY n OPTION (MAXDOP 1); Table T1 contains data like this: Next we load data into table T2. The relationship between the two tables is that table 2 contains ‘n’ rows for each row in table 1, where ‘n’ is determined by the value in Column1 of table T1. There is nothing particularly special about the data or distribution, by the way. INSERT dbo.T2 WITH (TABLOCKX) (TID, Column1) SELECT T.TID, N.n FROM dbo.T1 AS T JOIN dbo.Numbers AS N ON N.n >= 1 AND N.n <= T.Column1; Table T2 ends up containing about 15 million rows: The primary key for table T2 is a combination of TID and Column1. The data is partitioned according to the value in column TID alone. Partition Distribution The following query shows the number of rows in each partition of table T1: SELECT PartitionID = CA1.P, NumRows = COUNT_BIG(*) FROM dbo.T1 AS T CROSS APPLY (VALUES ($PARTITION.PFT(TID))) AS CA1 (P) GROUP BY CA1.P ORDER BY CA1.P; There are 40 partitions containing 125,000 rows (40 * 125k = 5m rows). The rightmost partition remains empty. The next query shows the distribution for table 2: SELECT PartitionID = CA1.P, NumRows = COUNT_BIG(*) FROM dbo.T2 AS T CROSS APPLY (VALUES ($PARTITION.PFT(TID))) AS CA1 (P) GROUP BY CA1.P ORDER BY CA1.P; There are roughly 375,000 rows in each partition (the rightmost partition is also empty): Ok, that’s the test data done. Test Query and Execution Plan The task is to count the rows resulting from joining tables 1 and 2 on the TID column: SET STATISTICS IO ON; DECLARE @s datetime2 = SYSUTCDATETIME(); SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID; SELECT DATEDIFF(Millisecond, @s, SYSUTCDATETIME()); SET STATISTICS IO OFF; The optimizer chooses a plan using parallel hash join, and partial aggregation: The Plan Explorer plan tree view shows accurate cardinality estimates and an even distribution of rows across threads (click to enlarge the image): With a warm data cache, the STATISTICS IO output shows that no physical I/O was needed, and all 41 partitions were touched: Running the query without actual execution plan or STATISTICS IO information for maximum performance, the query returns in around 2600ms. Execution Plan Analysis The first step toward improving on the execution plan produced by the query optimizer is to understand how it works, at least in outline. The two parallel Clustered Index Scans use multiple threads to read rows from tables T1 and T2. Parallel scan uses a demand-based scheme where threads are given page(s) to scan from the table as needed. This arrangement has certain important advantages, but does result in an unpredictable distribution of rows amongst threads. The point is that multiple threads cooperate to scan the whole table, but it is impossible to predict which rows end up on which threads. For correct results from the parallel hash join, the execution plan has to ensure that rows from T1 and T2 that might join are processed on the same thread. For example, if a row from T1 with join key value ‘1234’ is placed in thread 5’s hash table, the execution plan must guarantee that any rows from T2 that also have join key value ‘1234’ probe thread 5’s hash table for matches. The way this guarantee is enforced in this parallel hash join plan is by repartitioning rows to threads after each parallel scan. The two repartitioning exchanges route rows to threads using a hash function over the hash join keys. The two repartitioning exchanges use the same hash function so rows from T1 and T2 with the same join key must end up on the same hash join thread. Expensive Exchanges This business of repartitioning rows between threads can be very expensive, especially if a large number of rows is involved. The execution plan selected by the optimizer moves 5 million rows through one repartitioning exchange and around 15 million across the other. As a first step toward removing these exchanges, consider the execution plan selected by the optimizer if we join just one partition from each table, disallowing parallelism: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = 1 AND $PARTITION.PFT(T2.TID) = 1 OPTION (MAXDOP 1); The optimizer has chosen a (one-to-many) merge join instead of a hash join. The single-partition query completes in around 100ms. If everything scaled linearly, we would expect that extending this strategy to all 40 populated partitions would result in an execution time around 4000ms. Using parallelism could reduce that further, perhaps to be competitive with the parallel hash join chosen by the optimizer. This raises a question. If the most efficient way to join one partition from each of the tables is to use a merge join, why does the optimizer not choose a merge join for the full query? Forcing a Merge Join Let’s force the optimizer to use a merge join on the test query using a hint: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (MERGE JOIN); This is the execution plan selected by the optimizer: This plan results in the same number of logical reads reported previously, but instead of 2600ms the query takes 5000ms. The natural explanation for this drop in performance is that the merge join plan is only using a single thread, whereas the parallel hash join plan could use multiple threads. Parallel Merge Join We can get a parallel merge join plan using the same query hint as before, and adding trace flag 8649: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (MERGE JOIN, QUERYTRACEON 8649); The execution plan is: This looks promising. It uses a similar strategy to distribute work across threads as seen for the parallel hash join. In practice though, performance is disappointing. On a typical run, the parallel merge plan runs for around 8400ms; slower than the single-threaded merge join plan (5000ms) and much worse than the 2600ms for the parallel hash join. We seem to be going backwards! The logical reads for the parallel merge are still exactly the same as before, with no physical IOs. The cardinality estimates and thread distribution are also still very good (click to enlarge): A big clue to the reason for the poor performance is shown in the wait statistics (captured by Plan Explorer Pro): CXPACKET waits require careful interpretation, and are most often benign, but in this case excessive waiting occurs at the repartitioning exchanges. Unlike the parallel hash join, the repartitioning exchanges in this plan are order-preserving ‘merging’ exchanges (because merge join requires ordered inputs): Parallelism works best when threads can just grab any available unit of work and get on with processing it. Preserving order introduces inter-thread dependencies that can easily lead to significant waits occurring. In extreme cases, these dependencies can result in an intra-query deadlock, though the details of that will have to wait for another time to explore in detail. The potential for waits and deadlocks leads the query optimizer to cost parallel merge join relatively highly, especially as the degree of parallelism (DOP) increases. This high costing resulted in the optimizer choosing a serial merge join rather than parallel in this case. The test results certainly confirm its reasoning. Collocated Joins In SQL Server 2008 and later, the optimizer has another available strategy when joining tables that share a common partition scheme. This strategy is a collocated join, also known as as a per-partition join. It can be applied in both serial and parallel execution plans, though it is limited to 2-way joins in the current optimizer. Whether the optimizer chooses a collocated join or not depends on cost estimation. The primary benefits of a collocated join are that it eliminates an exchange and requires less memory, as we will see next. Costing and Plan Selection The query optimizer did consider a collocated join for our original query, but it was rejected on cost grounds. The parallel hash join with repartitioning exchanges appeared to be a cheaper option. There is no query hint to force a collocated join, so we have to mess with the costing framework to produce one for our test query. Pretending that IOs cost 50 times more than usual is enough to convince the optimizer to use collocated join with our test query: -- Pretend IOs are 50x cost temporarily DBCC SETIOWEIGHT(50); -- Co-located hash join SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (RECOMPILE); -- Reset IO costing DBCC SETIOWEIGHT(1); Collocated Join Plan The estimated execution plan for the collocated join is: The Constant Scan contains one row for each partition of the shared partitioning scheme, from 1 to 41. The hash repartitioning exchanges seen previously are replaced by a single Distribute Streams exchange using Demand partitioning. Demand partitioning means that the next partition id is given to the next parallel thread that asks for one. My test machine has eight logical processors, and all are available for SQL Server to use. As a result, there are eight threads in the single parallel branch in this plan, each processing one partition from each table at a time. Once a thread finishes processing a partition, it grabs a new partition number from the Distribute Streams exchange…and so on until all partitions have been processed. It is important to understand that the parallel scans in this plan are different from the parallel hash join plan. Although the scans have the same parallelism icon, tables T1 and T2 are not being co-operatively scanned by multiple threads in the same way. Each thread reads a single partition of T1 and performs a hash match join with the same partition from table T2. The properties of the two Clustered Index Scans show a Seek Predicate (unusual for a scan!) limiting the rows to a single partition: The crucial point is that the join between T1 and T2 is on TID, and TID is the partitioning column for both tables. A thread that processes partition ‘n’ is guaranteed to see all rows that can possibly join on TID for that partition. In addition, no other thread will see rows from that partition, so this removes the need for repartitioning exchanges. CPU and Memory Efficiency Improvements The collocated join has removed two expensive repartitioning exchanges and added a single exchange processing 41 rows (one for each partition id). Remember, the parallel hash join plan exchanges had to process 5 million and 15 million rows. The amount of processor time spent on exchanges will be much lower in the collocated join plan. In addition, the collocated join plan has a maximum of 8 threads processing single partitions at any one time. The 41 partitions will all be processed eventually, but a new partition is not started until a thread asks for it. Threads can reuse hash table memory for the new partition. The parallel hash join plan also had 8 hash tables, but with all 5,000,000 build rows loaded at the same time. The collocated plan needs memory for only 8 * 125,000 = 1,000,000 rows at any one time. Collocated Hash Join Performance The collated join plan has disappointing performance in this case. The query runs for around 25,300ms despite the same IO statistics as usual. This is much the worst result so far, so what went wrong? It turns out that cardinality estimation for the single partition scans of table T1 is slightly low. The properties of the Clustered Index Scan of T1 (graphic immediately above) show the estimation was for 121,951 rows. This is a small shortfall compared with the 125,000 rows actually encountered, but it was enough to cause the hash join to spill to physical tempdb: A level 1 spill doesn’t sound too bad, until you realize that the spill to tempdb probably occurs for each of the 41 partitions. As a side note, the cardinality estimation error is a little surprising because the system tables accurately show there are 125,000 rows in every partition of T1. Unfortunately, the optimizer uses regular column and index statistics to derive cardinality estimates here rather than system table information (e.g. sys.partitions). Collocated Merge Join We will never know how well the collocated parallel hash join plan might have worked without the cardinality estimation error (and the resulting 41 spills to tempdb) but we do know: Merge join does not require a memory grant; and Merge join was the optimizer’s preferred join option for a single partition join Putting this all together, what we would really like to see is the same collocated join strategy, but using merge join instead of hash join. Unfortunately, the current query optimizer cannot produce a collocated merge join; it only knows how to do collocated hash join. So where does this leave us? CROSS APPLY sys.partitions We can try to write our own collocated join query. We can use sys.partitions to find the partition numbers, and CROSS APPLY to get a count per partition, with a final step to sum the partial counts. The following query implements this idea: SELECT row_count = SUM(Subtotals.cnt) FROM ( -- Partition numbers SELECT p.partition_number FROM sys.partitions AS p WHERE p.[object_id] = OBJECT_ID(N'T1', N'U') AND p.index_id = 1 ) AS P CROSS APPLY ( -- Count per collocated join SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals; The estimated plan is: The cardinality estimates aren’t all that good here, especially the estimate for the scan of the system table underlying the sys.partitions view. Nevertheless, the plan shape is heading toward where we would like to be. Each partition number from the system table results in a per-partition scan of T1 and T2, a one-to-many Merge Join, and a Stream Aggregate to compute the partial counts. The final Stream Aggregate just sums the partial counts. Execution time for this query is around 3,500ms, with the same IO statistics as always. This compares favourably with 5,000ms for the serial plan produced by the optimizer with the OPTION (MERGE JOIN) hint. This is another case of the sum of the parts being less than the whole – summing 41 partial counts from 41 single-partition merge joins is faster than a single merge join and count over all partitions. Even so, this single-threaded collocated merge join is not as quick as the original parallel hash join plan, which executed in 2,600ms. On the positive side, our collocated merge join uses only one logical processor and requires no memory grant. The parallel hash join plan used 16 threads and reserved 569 MB of memory: Using a Temporary Table Our collocated merge join plan should benefit from parallelism. The reason parallelism is not being used is that the query references a system table. We can work around that by writing the partition numbers to a temporary table (or table variable): SET STATISTICS IO ON; DECLARE @s datetime2 = SYSUTCDATETIME(); CREATE TABLE #P ( partition_number integer PRIMARY KEY); INSERT #P (partition_number) SELECT p.partition_number FROM sys.partitions AS p WHERE p.[object_id] = OBJECT_ID(N'T1', N'U') AND p.index_id = 1; SELECT row_count = SUM(Subtotals.cnt) FROM #P AS p CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals; DROP TABLE #P; SELECT DATEDIFF(Millisecond, @s, SYSUTCDATETIME()); SET STATISTICS IO OFF; Using the temporary table adds a few logical reads, but the overall execution time is still around 3500ms, indistinguishable from the same query without the temporary table. The problem is that the query optimizer still doesn’t choose a parallel plan for this query, though the removal of the system table reference means that it could if it chose to: In fact the optimizer did enter the parallel plan phase of query optimization (running search 1 for a second time): Unfortunately, the parallel plan found seemed to be more expensive than the serial plan. This is a crazy result, caused by the optimizer’s cost model not reducing operator CPU costs on the inner side of a nested loops join. Don’t get me started on that, we’ll be here all night. In this plan, everything expensive happens on the inner side of a nested loops join. Without a CPU cost reduction to compensate for the added cost of exchange operators, candidate parallel plans always look more expensive to the optimizer than the equivalent serial plan. Parallel Collocated Merge Join We can produce the desired parallel plan using trace flag 8649 again: SELECT row_count = SUM(Subtotals.cnt) FROM #P AS p CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals OPTION (QUERYTRACEON 8649); The actual execution plan is: One difference between this plan and the collocated hash join plan is that a Repartition Streams exchange operator is used instead of Distribute Streams. The effect is similar, though not quite identical. The Repartition uses round-robin partitioning, meaning the next partition id is pushed to the next thread in sequence. The Distribute Streams exchange seen earlier used Demand partitioning, meaning the next partition id is pulled across the exchange by the next thread that is ready for more work. There are subtle performance implications for each partitioning option, but going into that would again take us too far off the main point of this post. Performance The important thing is the performance of this parallel collocated merge join – just 1350ms on a typical run. The list below shows all the alternatives from this post (all timings include creation, population, and deletion of the temporary table where appropriate) from quickest to slowest: Collocated parallel merge join: 1350ms Parallel hash join: 2600ms Collocated serial merge join: 3500ms Serial merge join: 5000ms Parallel merge join: 8400ms Collated parallel hash join: 25,300ms (hash spill per partition) The parallel collocated merge join requires no memory grant (aside from a paltry 1.2MB used for exchange buffers). This plan uses 16 threads at DOP 8; but 8 of those are (rather pointlessly) allocated to the parallel scan of the temporary table. These are minor concerns, but it turns out there is a way to address them if it bothers you. Parallel Collocated Merge Join with Demand Partitioning This final tweak replaces the temporary table with a hard-coded list of partition ids (dynamic SQL could be used to generate this query from sys.partitions): SELECT row_count = SUM(Subtotals.cnt) FROM ( VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10), (11),(12),(13),(14),(15),(16),(17),(18),(19),(20), (21),(22),(23),(24),(25),(26),(27),(28),(29),(30), (31),(32),(33),(34),(35),(36),(37),(38),(39),(40),(41) ) AS P (partition_number) CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals OPTION (QUERYTRACEON 8649); The actual execution plan is: The parallel collocated hash join plan is reproduced below for comparison: The manual rewrite has another advantage that has not been mentioned so far: the partial counts (per partition) can be computed earlier than the partial counts (per thread) in the optimizer’s collocated join plan. The earlier aggregation is performed by the extra Stream Aggregate under the nested loops join. The performance of the parallel collocated merge join is unchanged at around 1350ms. Final Words It is a shame that the current query optimizer does not consider a collocated merge join (Connect item closed as Won’t Fix). The example used in this post showed an improvement in execution time from 2600ms to 1350ms using a modestly-sized data set and limited parallelism. In addition, the memory requirement for the query was almost completely eliminated – down from 569MB to 1.2MB. The problem with the parallel hash join selected by the optimizer is that it attempts to process the full data set all at once (albeit using eight threads). It requires a large memory grant to hold all 5 million rows from table T1 across the eight hash tables, and does not take advantage of the divide-and-conquer opportunity offered by the common partitioning. The great thing about the collocated join strategies is that each parallel thread works on a single partition from both tables, reading rows, performing the join, and computing a per-partition subtotal, before moving on to a new partition. From a thread’s point of view… If you have trouble visualizing what is happening from just looking at the parallel collocated merge join execution plan, let’s look at it again, but from the point of view of just one thread operating between the two Parallelism (exchange) operators. Our thread picks up a single partition id from the Distribute Streams exchange, and starts a merge join using ordered rows from partition 1 of table T1 and partition 1 of table T2. By definition, this is all happening on a single thread. As rows join, they are added to a (per-partition) count in the Stream Aggregate immediately above the Merge Join. Eventually, either T1 (partition 1) or T2 (partition 1) runs out of rows and the merge join stops. The per-partition count from the aggregate passes on through the Nested Loops join to another Stream Aggregate, which is maintaining a per-thread subtotal. Our same thread now picks up a new partition id from the exchange (say it gets id 9 this time). The count in the per-partition aggregate is reset to zero, and the processing of partition 9 of both tables proceeds just as it did for partition 1, and on the same thread. Each thread picks up a single partition id and processes all the data for that partition, completely independently from other threads working on other partitions. One thread might eventually process partitions (1, 9, 17, 25, 33, 41) while another is concurrently processing partitions (2, 10, 18, 26, 34) and so on for the other six threads at DOP 8. The point is that all 8 threads can execute independently and concurrently, continuing to process new partitions until the wider job (of which the thread has no knowledge!) is done. This divide-and-conquer technique can be much more efficient than simply splitting the entire workload across eight threads all at once. Related Reading Understanding and Using Parallelism in SQL Server Parallel Execution Plans Suck © 2013 Paul White – All Rights Reserved Twitter: @SQL_Kiwi

Read the article
Toorcon14

- by danx

Toorcon 2012 Information Security Conference San Diego, CA, http://www.toorcon.org/ Dan Anderson, October 2012 It's almost Halloween, and we all know what that means—yes, of course, it's time for another Toorcon Conference! Toorcon is an annual conference for people interested in computer security. This includes the whole range of hackers, computer hobbyists, professionals, security consultants, press, law enforcement, prosecutors, FBI, etc. We're at Toorcon 14—see earlier blogs for some of the previous Toorcon's I've attended (back to 2003). This year's "con" was held at the Westin on Broadway in downtown San Diego, California. The following are not necessarily my views—I'm just the messenger—although I could have misquoted or misparaphrased the speakers. Also, I only reviewed some of the talks, below, which I attended and interested me. MalAndroid—the Crux of Android Infections, Aditya K. Sood Programming Weird Machines with ELF Metadata, Rebecca "bx" Shapiro Privacy at the Handset: New FCC Rules?, Valkyrie Hacking Measured Boot and UEFI, Dan Griffin You Can't Buy Security: Building the Open Source InfoSec Program, Boris Sverdlik What Journalists Want: The Investigative Reporters' Perspective on Hacking, Dave Maas & Jason Leopold Accessibility and Security, Anna Shubina Stop Patching, for Stronger PCI Compliance, Adam Brand McAfee Secure & Trustmarks — a Hacker's Best Friend, Jay James & Shane MacDougall MalAndroid—the Crux of Android Infections Aditya K. Sood, IOActive, Michigan State PhD candidate Aditya talked about Android smartphone malware. There's a lot of old Android software out there—over 50% Gingerbread (2.3.x)—and most have unpatched vulnerabilities. Of 9 Android vulnerabilities, 8 have known exploits (such as the old Gingerbread Global Object Table exploit). Android protection includes sandboxing, security scanner, app permissions, and screened Android app market. The Android permission checker has fine-grain resource control, policy enforcement. Android static analysis also includes a static analysis app checker (bouncer), and a vulnerablity checker. What security problems does Android have? User-centric security, which depends on the user to grant permission and make smart decisions. But users don't care or think about malware (the're not aware, not paranoid). All they want is functionality, extensibility, mobility Android had no "proper" encryption before Android 3.0 No built-in protection against social engineering and web tricks Alternative Android app markets are unsafe. Simply visiting some markets can infect Android Aditya classified Android Malware types as: Type A—Apps. These interact with the Android app framework. For example, a fake Netflix app. Or Android Gold Dream (game), which uploads user files stealthy manner to a remote location. Type K—Kernel. Exploits underlying Linux libraries or kernel Type H—Hybrid. These use multiple layers (app framework, libraries, kernel). These are most commonly used by Android botnets, which are popular with Chinese botnet authors What are the threats from Android malware? These incude leak info (contacts), banking fraud, corporate network attacks, malware advertising, malware "Hackivism" (the promotion of social causes. For example, promiting specific leaders of the Tunisian or Iranian revolutions. Android malware is frequently "masquerated". That is, repackaged inside a legit app with malware. To avoid detection, the hidden malware is not unwrapped until runtime. The malware payload can be hidden in, for example, PNG files. Less common are Android bootkits—there's not many around. What they do is hijack the Android init framework—alteering system programs and daemons, then deletes itself. For example, the DKF Bootkit (China). Android App Problems: no code signing! all self-signed native code execution permission sandbox — all or none alternate market places no robust Android malware detection at network level delayed patch process Programming Weird Machines with ELF Metadata Rebecca "bx" Shapiro, Dartmouth College, NH https://github.com/bx/elf-bf-tools @bxsays on twitter Definitions. "ELF" is an executable file format used in linking and loading executables (on UNIX/Linux-class machines). "Weird machine" uses undocumented computation sources (I think of them as unintended virtual machines). Some examples of "weird machines" are those that: return to weird location, does SQL injection, corrupts the heap. Bx then talked about using ELF metadata as (an uintended) "weird machine". Some ELF background: A compiler takes source code and generates a ELF object file (hello.o). A static linker makes an ELF executable from the object file. A runtime linker and loader takes ELF executable and loads and relocates it in memory. The ELF file has symbols to relocate functions and variables. ELF has two relocation tables—one at link time and another one at loading time: .rela.dyn (link time) and .dynsym (dynamic table). GOT: Global Offset Table of addresses for dynamically-linked functions. PLT: Procedure Linkage Tables—works with GOT. The memory layout of a process (not the ELF file) is, in order: program (+ heap), dynamic libraries, libc, ld.so, stack (which includes the dynamic table loaded into memory) For ELF, the "weird machine" is found and exploited in the loader. ELF can be crafted for executing viruses, by tricking runtime into executing interpreted "code" in the ELF symbol table. One can inject parasitic "code" without modifying the actual ELF code portions. Think of the ELF symbol table as an "assembly language" interpreter. It has these elements: instructions: Add, move, jump if not 0 (jnz) Think of symbol table entries as "registers" symbol table value is "contents" immediate values are constants direct values are addresses (e.g., 0xdeadbeef) move instruction: is a relocation table entry add instruction: relocation table "addend" entry jnz instruction: takes multiple relocation table entries The ELF weird machine exploits the loader by relocating relocation table entries. The loader will go on forever until told to stop. It stores state on stack at "end" and uses IFUNC table entries (containing function pointer address). The ELF weird machine, called "Brainfu*k" (BF) has: 8 instructions: pointer inc, dec, inc indirect, dec indirect, jump forward, jump backward, print. Three registers - 3 registers Bx showed example BF source code that implemented a Turing machine printing "hello, world". More interesting was the next demo, where bx modified ping. Ping runs suid as root, but quickly drops privilege. BF modified the loader to disable the library function call dropping privilege, so it remained as root. Then BF modified the ping -t argument to execute the -t filename as root. It's best to show what this modified ping does with an example: $ whoami bx $ ping localhost -t backdoor.sh # executes backdoor $ whoami root $ The modified code increased from 285948 bytes to 290209 bytes. A BF tool compiles "executable" by modifying the symbol table in an existing ELF executable. The tool modifies .dynsym and .rela.dyn table, but not code or data. Privacy at the Handset: New FCC Rules? "Valkyrie" (Christie Dudley, Santa Clara Law JD candidate) Valkyrie talked about mobile handset privacy. Some background: Senator Franken (also a comedian) became alarmed about CarrierIQ, where the carriers track their customers. Franken asked the FCC to find out what obligations carriers think they have to protect privacy. The carriers' response was that they are doing just fine with self-regulation—no worries! Carriers need to collect data, such as missed calls, to maintain network quality. But carriers also sell data for marketing. Verizon sells customer data and enables this with a narrow privacy policy (only 1 month to opt out, with difficulties). The data sold is not individually identifiable and is aggregated. But Verizon recommends, as an aggregation workaround to "recollate" data to other databases to identify customers indirectly. The FCC has regulated telephone privacy since 1934 and mobile network privacy since 2007. Also, the carriers say mobile phone privacy is a FTC responsibility (not FCC). FTC is trying to improve mobile app privacy, but FTC has no authority over carrier / customer relationships. As a side note, Apple iPhones are unique as carriers have extra control over iPhones they don't have with other smartphones. As a result iPhones may be more regulated. Who are the consumer advocates? Everyone knows EFF, but EPIC (Electrnic Privacy Info Center), although more obsecure, is more relevant. What to do? Carriers must be accountable. Opt-in and opt-out at any time. Carriers need incentive to grant users control for those who want it, by holding them liable and responsible for breeches on their clock. Location information should be added current CPNI privacy protection, and require "Pen/trap" judicial order to obtain (and would still be a lower standard than 4th Amendment). Politics are on a pro-privacy swing now, with many senators and the Whitehouse. There will probably be new regulation soon, and enforcement will be a problem, but consumers will still have some benefit. Hacking Measured Boot and UEFI Dan Griffin, JWSecure, Inc., Seattle, @JWSdan Dan talked about hacking measured UEFI boot. First some terms: UEFI is a boot technology that is replacing BIOS (has whitelisting and blacklisting). UEFI protects devices against rootkits. TPM - hardware security device to store hashs and hardware-protected keys "secure boot" can control at firmware level what boot images can boot "measured boot" OS feature that tracks hashes (from BIOS, boot loader, krnel, early drivers). "remote attestation" allows remote validation and control based on policy on a remote attestation server. Microsoft pushing TPM (Windows 8 required), but Google is not. Intel TianoCore is the only open source for UEFI. Dan has Measured Boot Tool at http://mbt.codeplex.com/ with a demo where you can also view TPM data. TPM support already on enterprise-class machines. UEFI Weaknesses. UEFI toolkits are evolving rapidly, but UEFI has weaknesses: assume user is an ally trust TPM implicitly, and attached to computer hibernate file is unprotected (disk encryption protects against this) protection migrating from hardware to firmware delays in patching and whitelist updates will UEFI really be adopted by the mainstream (smartphone hardware support, bank support, apathetic consumer support) You Can't Buy Security: Building the Open Source InfoSec Program Boris Sverdlik, ISDPodcast.com co-host Boris talked about problems typical with current security audits. "IT Security" is an oxymoron—IT exists to enable buiness, uptime, utilization, reporting, but don't care about security—IT has conflict of interest. There's no Magic Bullet ("blinky box"), no one-size-fits-all solution (e.g., Intrusion Detection Systems (IDSs)). Regulations don't make you secure. The cloud is not secure (because of shared data and admin access). Defense and pen testing is not sexy. Auditors are not solution (security not a checklist)—what's needed is experience and adaptability—need soft skills. Step 1: First thing is to Google and learn the company end-to-end before you start. Get to know the management team (not IT team), meet as many people as you can. Don't use arbitrary values such as CISSP scores. Quantitive risk assessment is a myth (e.g. AV*EF-SLE). Learn different Business Units, legal/regulatory obligations, learn the business and where the money is made, verify company is protected from script kiddies (easy), learn sensitive information (IP, internal use only), and start with low-hanging fruit (customer service reps and social engineering). Step 2: Policies. Keep policies short and relevant. Generic SANS "security" boilerplate policies don't make sense and are not followed. Focus on acceptable use, data usage, communications, physical security. Step 3: Implementation: keep it simple stupid. Open source, although useful, is not free (implementation cost). Access controls with authentication & authorization for local and remote access. MS Windows has it, otherwise use OpenLDAP, OpenIAM, etc. Application security Everyone tries to reinvent the wheel—use existing static analysis tools. Review high-risk apps and major revisions. Don't run different risk level apps on same system. Assume host/client compromised and use app-level security control. Network security VLAN != segregated because there's too many workarounds. Use explicit firwall rules, active and passive network monitoring (snort is free), disallow end user access to production environment, have a proxy instead of direct Internet access. Also, SSL certificates are not good two-factor auth and SSL does not mean "safe." Operational Controls Have change, patch, asset, & vulnerability management (OSSI is free). For change management, always review code before pushing to production For logging, have centralized security logging for business-critical systems, separate security logging from administrative/IT logging, and lock down log (as it has everything). Monitor with OSSIM (open source). Use intrusion detection, but not just to fulfill a checkbox: build rules from a whitelist perspective (snort). OSSEC has 95% of what you need. Vulnerability management is a QA function when done right: OpenVas and Seccubus are free. Security awareness The reality is users will always click everything. Build real awareness, not compliance driven checkbox, and have it integrated into the culture. Pen test by crowd sourcing—test with logging COSSP http://www.cossp.org/ - Comprehensive Open Source Security Project What Journalists Want: The Investigative Reporters' Perspective on Hacking Dave Maas, San Diego CityBeat Jason Leopold, Truthout.org The difference between hackers and investigative journalists: For hackers, the motivation varies, but method is same, technological specialties. For investigative journalists, it's about one thing—The Story, and they need broad info-gathering skills. J-School in 60 Seconds: Generic formula: Person or issue of pubic interest, new info, or angle. Generic criteria: proximity, prominence, timeliness, human interest, oddity, or consequence. Media awareness of hackers and trends: journalists becoming extremely aware of hackers with congressional debates (privacy, data breaches), demand for data-mining Journalists, use of coding and web development for Journalists, and Journalists busted for hacking (Murdock). Info gathering by investigative journalists include Public records laws. Federal Freedom of Information Act (FOIA) is good, but slow. California Public Records Act is a lot stronger. FOIA takes forever because of foot-dragging—it helps to be specific. Often need to sue (especially FBI). CPRA is faster, and requests can be vague. Dumps and leaks (a la Wikileaks) Journalists want: leads, protecting ourselves, our sources, and adapting tools for news gathering (Google hacking). Anonomity is important to whistleblowers. They want no digital footprint left behind (e.g., email, web log). They don't trust encryption, want to feel safe and secure. Whistleblower laws are very weak—there's no upside for whistleblowers—they have to be very passionate to do it. Accessibility and Security or: How I Learned to Stop Worrying and Love the Halting Problem Anna Shubina, Dartmouth College Anna talked about how accessibility and security are related. Accessibility of digital content (not real world accessibility). mostly refers to blind users and screenreaders, for our purpose. Accessibility is about parsing documents, as are many security issues. "Rich" executable content causes accessibility to fail, and often causes security to fail. For example MS Word has executable format—it's not a document exchange format—more dangerous than PDF or HTML. Accessibility is often the first and maybe only sanity check with parsing. They have no choice because someone may want to read what you write. Google, for example, is very particular about web browser you use and are bad at supporting other browsers. Uses JavaScript instead of links, often requiring mouseover to display content. PDF is a security nightmare. Executible format, embedded flash, JavaScript, etc. 15 million lines of code. Google Chrome doesn't handle PDF correctly, causing several security bugs. PDF has an accessibility checker and PDF tagging, to help with accessibility. But no PDF checker checks for incorrect tags, untagged content, or validates lists or tables. None check executable content at all. The "Halting Problem" is: can one decide whether a program will ever stop? The answer, in general, is no (Rice's theorem). The same holds true for accessibility checkers. Language-theoretic Security says complicated data formats are hard to parse and cannot be solved due to the Halting Problem. W3C Web Accessibility Guidelines: "Perceivable, Operable, Understandable, Robust" Not much help though, except for "Robust", but here's some gems: * all information should be parsable (paraphrasing) * if not parsable, cannot be converted to alternate formats * maximize compatibility in new document formats Executible webpages are bad for security and accessibility. They say it's for a better web experience. But is it necessary to stuff web pages with JavaScript for a better experience? A good example is The Drudge Report—it has hand-written HTML with no JavaScript, yet drives a lot of web traffic due to good content. A bad example is Google News—hidden scrollbars, guessing user input. Solutions: Accessibility and security problems come from same source Expose "better user experience" myth Keep your corner of Internet parsable Remember "Halting Problem"—recognize false solutions (checking and verifying tools) Stop Patching, for Stronger PCI Compliance Adam Brand, protiviti @adamrbrand, http://www.picfun.com/ Adam talked about PCI compliance for retail sales. Take an example: for PCI compliance, 50% of Brian's time (a IT guy), 960 hours/year was spent patching POSs in 850 restaurants. Often applying some patches make no sense (like fixing a browser vulnerability on a server). "Scanner worship" is overuse of vulnerability scanners—it gives a warm and fuzzy and it's simple (red or green results—fix reds). Scanners give a false sense of security. In reality, breeches from missing patches are uncommon—more common problems are: default passwords, cleartext authentication, misconfiguration (firewall ports open). Patching Myths: Myth 1: install within 30 days of patch release (but PCI Â§6.1 allows a "risk-based approach" instead). Myth 2: vendor decides what's critical (also PCI Â§6.1). But Â§6.2 requires user ranking of vulnerabilities instead. Myth 3: scan and rescan until it passes. But PCI Â§11.2.1b says this applies only to high-risk vulnerabilities. Adam says good recommendations come from NIST 800-40. Instead use sane patching and focus on what's really important. From NIST 800-40: Proactive: Use a proactive vulnerability management process: use change control, configuration management, monitor file integrity. Monitor: start with NVD and other vulnerability alerts, not scanner results. Evaluate: public-facing system? workstation? internal server? (risk rank) Decide:on action and timeline Test: pre-test patches (stability, functionality, rollback) for change control Install: notify, change control, tickets McAfee Secure & Trustmarks — a Hacker's Best Friend Jay James, Shane MacDougall, Tactical Intelligence Inc., Canada "McAfee Secure Trustmark" is a website seal marketed by McAfee. A website gets this badge if they pass their remote scanning. The problem is a removal of trustmarks act as flags that you're vulnerable. Easy to view status change by viewing McAfee list on website or on Google. "Secure TrustGuard" is similar to McAfee. Jay and Shane wrote Perl scripts to gather sites from McAfee and search engines. If their certification image changes to a 1x1 pixel image, then they are longer certified. Their scripts take deltas of scans to see what changed daily. The bottom line is change in TrustGuard status is a flag for hackers to attack your site. Entire idea of seals is silly—you're raising a flag saying if you're vulnerable.

Read the article
Using BPEL Performance Statistics to Diagnose Performance Bottlenecks

- by fip

Tuning performance of Oracle SOA 11G applications could be challenging. Because SOA is a platform for you to build composite applications that connect many applications and "services", when the overall performance is slow, the bottlenecks could be anywhere in the system: the applications/services that SOA connects to, the infrastructure database, or the SOA server itself.How to quickly identify the bottleneck becomes crucial in tuning the overall performance. Fortunately, the BPEL engine in Oracle SOA 11G (and 10G, for that matter) collects BPEL Engine Performance Statistics, which show the latencies of low level BPEL engine activities. The BPEL engine performance statistics can make it a bit easier for you to identify the performance bottleneck. Although the BPEL engine performance statistics are always available, the access to and interpretation of them are somewhat obscure in the early and current (PS5) 11G versions. This blog attempts to offer instructions that help you to enable, retrieve and interpret the performance statistics, before the future versions provides a more pleasant user experience. Overview of BPEL Engine Performance Statistics SOA BPEL has a feature of collecting some performance statistics and store them in memory. One MBean attribute, StatLastN, configures the size of the memory buffer to store the statistics. This memory buffer is a "moving window", in a way that old statistics will be flushed out by the new if the amount of data exceeds the buffer size. Since the buffer size is limited by StatLastN, impacts of statistics collection on performance is minimal. By default StatLastN=-1, which means no collection of performance data. Once the statistics are collected in the memory buffer, they can be retrieved via another MBean oracle.as.soainfra.bpel:Location=[Server Name],name=BPELEngine,type=BPELEngine.> My friend in Oracle SOA development wrote this simple 'bpelstat' web app that looks up and retrieves the performance data from the MBean and displays it in a human readable form. It does not have beautiful UI but it is fairly useful. Although in Oracle SOA 11.1.1.5 onwards the same statistics can be viewed via a more elegant UI under "request break down" at EM -> SOA Infrastructure -> Service Engines -> BPEL -> Statistics, some unsophisticated minds like mine may still prefer the simplicity of the 'bpelstat' JSP. One thing that simple JSP does do well is that you can save the page and send it to someone to further analyze Follows are the instructions of how to install and invoke the BPEL statistic JSP. My friend in SOA Development will soon blog about interpreting the statistics. Stay tuned. Step1: Enable BPEL Engine Statistics for Each SOA Servers via Enterprise Manager First st you need to set the StatLastN to some number as a way to enable the collection of BPEL Engine Performance Statistics EM Console -> soa-infra(Server Name) -> SOA Infrastructure -> SOA Administration -> BPEL Properties Click on "More BPEL Configuration Properties" Click on attribute "StatLastN", set its value to some integer number. Typically you want to set it 1000 or more. Step 2: Download and Deploy bpelstat.war File to Admin Server, Note: the WAR file contains a JSP that does NOT have any security restriction. You do NOT want to keep in your production server for a long time as it is a security hazard. Deactivate the war once you are done. Download the bpelstat.war to your local PC At WebLogic Console, Go to Deployments -> Install Click on the "upload your file(s)" Click the "Browse" button to upload the deployment to Admin Server Accept the uploaded file as the path, click next Check the default option "Install this deployment as an application" Check "AdminServer" as the target server Finish the rest of the deployment with default settings Console -> Deployments Check the box next to "bpelstat" application Click on the "Start" button. It will change the state of the app from "prepared" to "active" Step 3: Invoke the BPEL Statistic Tool The BPELStat tool merely call the MBean of BPEL server and collects and display the in-memory performance statics. You usually want to do that after some peak loads. Go to http://<admin-server-host>:<admin-server-port>/bpelstat Enter the correct admin hostname, port, username and password Enter the SOA Server Name from which you want to collect the performance statistics. For example, SOA_MS1, etc. Click Submit Keep doing the same for all SOA servers. Step 3: Interpret the BPEL Engine Statistics You will see a few categories of BPEL Statistics from the JSP Page. First it starts with the overall latency of BPEL processes, grouped by synchronous and asynchronous processes. Then it provides the further break down of the measurements through the life time of a BPEL request, which is called the "request break down". 1. Overall latency of BPEL processes The top of the page shows that the elapse time of executing the synchronous process TestSyncBPELProcess from the composite TestComposite averages at about 1543.21ms, while the elapse time of executing the asynchronous process TestAsyncBPELProcess from the composite TestComposite2 averages at about 1765.43ms. The maximum and minimum latency were also shown. Synchronous process statistics <statistics> <stats key="default/TestComposite!2.0.2-ScopedJMSOSB*soa_bfba2527-a9ba-41a7-95c5-87e49c32f4ff/TestSyncBPELProcess" min="1234" max="4567" average="1543.21" count="1000"> </stats> </statistics> Asynchronous process statistics <statistics> <stats key="default/TestComposite2!2.0.2-ScopedJMSOSB*soa_bfba2527-a9ba-41a7-95c5-87e49c32f4ff/TestAsyncBPELProcess" min="2234" max="3234" average="1765.43" count="1000"> </stats> </statistics> 2. Request break down Under the overall latency categorized by synchronous and asynchronous processes is the "Request breakdown". Organized by statistic keys, the Request breakdown gives finer grain performance statistics through the life time of the BPEL requests.It uses indention to show the hierarchy of the statistics. Request breakdown <statistics> <stats key="eng-composite-request" min="0" max="0" average="0.0" count="0"> <stats key="eng-single-request" min="22" max="606" average="258.43" count="277"> <stats key="populate-context" min="0" max="0" average="0.0" count="248"> Please note that in SOA 11.1.1.6, the statistics under Request breakdown is aggregated together cross all the BPEL processes based on statistic keys. It does not differentiate between BPEL processes. If two BPEL processes happen to have the statistic that share same statistic key, the statistics from two BPEL processes will be aggregated together. Keep this in mind when we go through more details below. 2.1 BPEL process activity latencies A very useful measurement in the Request Breakdown is the performance statistics of the BPEL activities you put in your BPEL processes: Assign, Invoke, Receive, etc. The names of the measurement in the JSP page directly come from the names to assign to each BPEL activity. These measurements are under the statistic key "actual-perform" Example 1: Follows is the measurement for BPEL activity "AssignInvokeCreditProvider_Input", which looks like the Assign activity in a BPEL process that assign an input variable before passing it to the invocation: <stats key="AssignInvokeCreditProvider_Input" min="1" max="8" average="1.9" count="153"> <stats key="sensor-send-activity-data" min="0" max="1" average="0.0" count="306"> </stats> <stats key="sensor-send-variable-data" min="0" max="0" average="0.0" count="153"> </stats> <stats key="monitor-send-activity-data" min="0" max="0" average="0.0" count="306"> </stats> </stats> Note: because as previously mentioned that the statistics cross all BPEL processes are aggregated together based on statistic keys, if two BPEL processes happen to name their Invoke activity the same name, they will show up at one measurement (i.e. statistic key). Example 2: Follows is the measurement of BPEL activity called "InvokeCreditProvider". You can not only see that by average it takes 3.31ms to finish this call (pretty fast) but also you can see from the further break down that most of this 3.31 ms was spent on the "invoke-service". <stats key="InvokeCreditProvider" min="1" max="13" average="3.31" count="153"> <stats key="initiate-correlation-set-again" min="0" max="0" average="0.0" count="153"> </stats> <stats key="invoke-service" min="1" max="13" average="3.08" count="153"> <stats key="prep-call" min="0" max="1" average="0.04" count="153"> </stats> </stats> <stats key="initiate-correlation-set" min="0" max="0" average="0.0" count="153"> </stats> <stats key="sensor-send-activity-data" min="0" max="0" average="0.0" count="306"> </stats> <stats key="sensor-send-variable-data" min="0" max="0" average="0.0" count="153"> </stats> <stats key="monitor-send-activity-data" min="0" max="0" average="0.0" count="306"> </stats> <stats key="update-audit-trail" min="0" max="2" average="0.03" count="153"> </stats> </stats> 2.2 BPEL engine activity latency Another type of measurements under Request breakdown are the latencies of underlying system level engine activities. These activities are not directly tied to a particular BPEL process or process activity, but they are critical factors in the overall engine performance. These activities include the latency of saving asynchronous requests to database, and latency of process dehydration. My friend Malkit Bhasin is working on providing more information on interpreting the statistics on engine activities on his blog (https://blogs.oracle.com/malkit/). I will update this blog once the information becomes available. Update on 2012-10-02: My friend Malkit Bhasin has published the detail interpretation of the BPEL service engine statistics at his blog http://malkit.blogspot.com/2012/09/oracle-bpel-engine-soa-suite.html.

Read the article

Search Results

Search found 5377 results on 216 pages for 'robert low'.

Page 213/216 | < Previous Page | 209 210 211 212 213 214 215 216 | Next Page >

- by Kitteran

- by bob moch

- by user555854

- by DaveCol

- by twistedbrain

- by pranjal

- by Anadi Misra

- by Anadi Misra

- by Chris

- by Anadi Misra

- by Daniel

- by WishCow

- by whitequark

- by Geoff Dalgas

- by Kamil Kisiel

- by arungupta

- by shiju

- by Rick Strahl

- by Rick Strahl

- by Jon Galloway

- by Rick Strahl

- by Martin Hinshelwood

- by Paul White

- by danx

- by fip

< Previous Page | 209 210 211 212 213 214 215 216 | Next Page >