Killing Stuck Child JVM's
- by ACShorten
Note: This facility only applies to Oracle Utilities Application Framework products using COBOL.
In some situations, the Child JVM's may spin. This causes multiple startup/shutdown Child JVM messages to be displayed and recursive child JVM's to be initiated and shunned. If the following:
Unable to establish connection on port …. after waiting .. seconds.The issue can be caused intermittently by CPU spins in connection to the creation of new processes, specifically Child JVMs. Recursive (or double) invocation of the System.exit call in the remote JVM may be caused by a Process.destroy call that the parent JVM always issues when shunning a JVM. The issue may happen when the thread in the parent JVM that is responsible for the recycling gets stuck and it affects all child JVMs.
If this issue occurs at your site then there are a number of options to address the issue:
Configure an Operating System level kill command to force the Child JVM to be shunned when it becomes stuck.
Configure a Process.destroy command to be used if the kill command is not configured or desired.
Specify a time tolerance to detect stuck threads before issuing the Process.destroy or kill commands.
Note: This facility is also used when the Parent JVM is also shutdown to ensure no zombie Child JVM's exit.
The following additional settings must be added to the spl.properties for the Business Application Server to use this facility:
spl.runtime.cobol.remote.kill.command – Specify the command to kill the Child JVM process. This can be a command or specify a script to execute to provide additional information. The kill.command property can accept two arguments, {pid} and {jvmNumber}, in the specified string. The arguments must be enclosed in curly braces as shown here.
Note: The PID will be appended to the killcmd string, unless the {pid} and {jvmNumber} arguments are specified. The jvmNumber can be useful if passed to a script for logging purposes.
Note: If a script is used it must be in the path and be executable by the OS user running the system.
spl.runtime.cobol.remote.destroy.enabled – Specify whether to use the Process.destroy command instead of the kill command. Specify true or false. Default value is false.
Note: Unless otherwise required, it is recommended to use the kill command option if shunning JVM's is an issue. There this value can remain its default value, false, unless otherwise required.
spl.runtime.cobol.remote.kill.delaysecs – Specify the number of seconds to wait for the Child JVM to terminate naturally before issuing the Process.destroy or kill commands. Default is 10 seconds.
For example:
spl.runtime.cobol.remote.kill.command=kill -9 {pid} {jvmNumber}spl.runtime.cobol.remote.destroy.enabled=falsespl.runtime.cobol.remote.kill.delaysecs=10
When a Child JVM is to be recycled, these properties are inspected and the spl.runtime.cobol.remote.kill.command, executed if provided. This is done after waiting for spl.runtime.cobol.remote.kill.delaysecs seconds to give the JVM time to shut itself down. The spl.runtime.cobol.remote.destroy.enabled property must be set to true AND the spl.runtime.cobol.remote.kill.command omitted for the original Process.destroy command to be used on the process.
Note: By default the spl.runtime.cobol.remote.destroy enabled is set to false and is therefore disabled.
If neither spl.runtime.cobol.remote.kill.command nor spl.runtime.cobol.remote.destroy.enabled is specified, child JVMs will not beforcibly killed. They will be left to shut themselves down (which may lead to orphan JVMs). If both are specified, the spl.runtime.cobol.remote.kill.command is preferred and spl.runtime.cobol.remote.destroy.enabled defaulted to false.It is recommended to invoke a script to issue the direct kill command instead of directly using the kill -9 commands.For example, the following sample script ensures that the process Id is an active cobjrun process before issuing the kill command:
forcequit.sh
#!/bin/shTHETIME=`date +"%Y-%m-%d %H:%M:%S"`if [ "$1" = "" ]then echo "$THETIME: Process Id is required" >>$SPLSYSTEMLOGS/forcequit.log exit 1fijavaexec=cobjrunps e $1 | grep -c $javaexecif [ $? = 0 ]then echo "$THETIME: Process $1 is an active $javaexec process -- issuing kill-9 $1" >>$SPLSYSTEMLOGS/forcequit.log kill -9 $1exit 0else echo "$THETIME: Process id $1 is not a $javaexec process or not active -- kill will not be issued" >>$SPLSYSTEMLOGS/forcequit.logexit 1fi
This script's name would then be specified as the value for the spl.runtime.cobol.remote.kill.command property, for example:
spl.runtime.cobol.remote.kill.command=forcequit.sh
The forcequit script does not have any explicit parameters but pid is passed automatically.
To use the jvmNumber parameter it must explicitly specified in the command. For example, to call script forcequit.sh and pass it the pid and the child JVM number, specify it as follows:
spl.runtime.cobol.remote.kill.command=forcequit.sh {pid} {jvmNumber}
The script can then use the JVM number for logging purposes or to further ensure that the correct pid is being killed.If the arguments are omitted, the pid is automatically appended to the spl.runtime.cobol.remote.kill.command string.
To use this facility the following patches must be installed:
Patch 13719584 for Oracle Utilities Application Framework V2.1,
Patches 13684595 and 13634933 for Oracle Utilities Application Framework V2.2
Group Fix 4 (as Patch 13640668) for Oracle Utilities Application Framework V4.1.