Introduction
There is nothing more frustrating than a problem that "cannot be reproduced". Logs, configuration files have been analysed but there just isn't enough information to establish the root cause. The issue maybe closed, but you are left with the feeling that the problem will raise its ugly head again in the future. Trouble is, to resolve such issues you need to capture diagnostic data at the exact time the incident occurs. Step forward Fusion Middleware Diagnostic Framework!
Diagnostic Framework monitors WebLogic Managed Servers and delivers "Automatic capture of diagnostic data upon first failure". To quote fromOracle Fusion Middleware Administrator's Guide 11g Release 1 (11.1.1)Chapter 13 Diagnosing Problems
"When a critical error occurs ... the Diagnostic Framework automatically collects diagnostics, such as thread dumps, DMS metric dumps, and WebLogic Diagnostics Framework (WLDF) server image dumps ... The data is stored in a file-based repository and is accessible with command-line utilities."
In other words the data collected upon first failure - especially the thread and image dumps - provides a snapshot of the system as or immediately after the problem occurs. The table below shows the type of WebLogic Server issues which fall into the scope of Diagnostic Framework
How to Configure Diagnostic Framework?
Depending on your Fusion Middleware product choice you may not need to do anything! Diagnostic Framework is automatically installed, configured and initiated for any WebLogic Domain which has the Oracle Java Required Files (JRF) template applied. This template is applied by default whenever you configure WebLogic Managed Servers for products such as
Portal / Forms / Reports / Discoverer
Identity Management ( OID , OAM , OIM etc)
WebCenter
SOA
Check your WebLogic Domain directory structure. If you have an "adr" sub directory under
DOMAIN_HOME/servers/<servername>/
then JRF template has been applied and Diagnostic Framework will be in play.
Should the "adr" sub directory not exist, review the advice given in My Oracle Support article
How to Apply FMW ( EM ) Control and JRF to a WebLogic Domain and Managed Servers [ID 947043.1]
If you are working with a standalone WebLogic Server solution and applying Oracle JRF is not acceptable, consider using WLDF - WebLogic Diagnostic Framework. (Fusion Middleware Diagnostic Framework makes use of WLDF under the covers.) Couple of useful links about WLDF are listed below
Configuring and Using the Diagnostics Framework for Oracle WebLogic Server 11g
WebLogic Diagnostics Framework-A Very Useful Tool [A nice blog which describes a WLDF use case]
How to Get Started With Diagnostic Framework
To be frank, the Fusion Middleware Administrator's Guide is the best place to start your learning
Oracle Fusion Middleware Administrator's Guide 11g Release 1 (11.1.1)Chapter 13 Diagnosing Problems
A lot of reading here, but if you are in hurry and just want to get the right information to Oracle Support to help resolve your issue, check out the next section below.
How to Upload Diagnostic Framework Incident Data to Oracle Support
Some Background Information
There are three interfaces to the Repository:
Enterprise Manager Cloud Control (Support Workbench)
WLST (Command Line)
ADRCI (Command Line)
The Enterprise Manager Cloud Control does provide a nice GUI interface to search, view and package diagnostic framework incidents. However, this software is not to be confused with Fusion Middleware (EM) Control. Cloud Control (formerly known as Grid Control) is part of the Enterprise Manager media package. EM Cloud Control has it's own install and configuration story. Therefore, for the benefit of those yet to install and play with Cloud Control, I am going to describe how to use the command line tools.
Ideally, you would only need to one command line interface, but currently I suggest using both - mainly due to the fact that ADRCI SHOW INCIDENTS does not reveal the description behind the Diagnostic Framework error code.
Instructions:
Note:
WLST and ADRCI are case sensitive when it comes to handling parameter values. If you make a mistake, expect an unfriendly syntax error message.
1) Find the incident
Note:
The managed server which you are troubleshooting must be up and running. If the managed server is down, ensure the domain's Admin Server is accessible. If you cannot connect to the Admin Server or the Managed Server the example WLST commands will not work.
a) Launch WLST
Note: Use the WLST which resides in the "oracle_common" directory (not WL_HOME/common/bin) otherwise you will get a syntax error like the one below
Traceback (innermost last): File "<console>", line 1, in ?NameError: listIncidents
MW_HOME/oracle_common/common/bin/wlst.sh
b) Connect to the managed server or the admin server e.g.
wls:/offline> connect('weblogic','welcome1','t3://localhost:7020')
c) Run the command
wls:/MyDomain/serverConfig> listIncidents()
This will list the incidents for the server to which you have connected. If you have connected to the Admin Server and want to list the incidents for a managed server within the domain, use the command
wls:/MyDomain/serverConfig> listIncidents(adrHome='diag\ofm\MyDomain\MyManagedServer'
,server='MyManagedServer')
Example output
Incident Id Problem Key Incident Time
1 DFW-99998 [java.lang.NullPointerException]
[oracle.error.simulator.ErrorSimulator.createNullPointerException][errorWebApp_1-0-0-0]
Fri Nov 02 10:38:46 GMT 2012
The piece highlighted in bold is the description you do not see when using the ADRCI 'SHOW INCIDENT' command.
Make a note of the incident id. You are ready to move to step 2
2. Package the incident
a) Set up the environment - example commands below are for Unix
cd <DOMAIN_HOME>/bin
. ./setDomainEnv.sh
If you want ADRCI to run a Remote Diagnostic Agent collection (recommended) at generate package time, point ORACLE_HOME at oracle_common
ORACLE_HOME=$MW_HOME/oracle_common; export ORACLE_HOME
To prevent ADRCI from running RDA at generate package time, point ORACLE_HOME at WL_HOME/server/adr directory.
ORACLE_HOME=$WL_HOME/server/adr; export ORACLE_HOME
b) Launch adrci
$WL_HOME/server/adr/adrci
c) Set BASE and HOMEPATH
adrci> SET BASE /oracle/middleware/user_projects/domains/
mydomain/servers/mymanagedserver/adr
adrci> SET HOMEPATH diag/ofm/mydomain/mymanagedserver
d) Optionally run SHOW INCIDENTS e.g.
adrci> SHOW INCIDENTS -MODE DETAIL
ADR Home = /oracle/middleware/user_projects/domains/mydomain/
servers/mymanagedserver/adr/diag/ofm/mydomain/mymanagedserver:***********************************************************************************************************************************INCIDENT INFO RECORD 1********************************************************** INCIDENT_ID 1 STATUS ready CREATE_TIME 2012-11-02 10:38:46.468000 +00:00 PROBLEM_ID 1 CLOSE_TIME <NULL> FLOOD_CONTROLLED none ERROR_FACILITY DFW ERROR_NUMBER 99998 ERROR_ARG1 <NULL> ERROR_ARG2 <NULL> ERROR_ARG3 <NULL> ERROR_ARG4 <NULL> ERROR_ARG5 <NULL> ERROR_ARG6 <NULL> ERROR_ARG7 <NULL> ERROR_ARG8 <NULL> ERROR_ARG9 <NULL> ERROR_ARG10 <NULL> ERROR_ARG11 <NULL> ERROR_ARG12 <NULL> SIGNALLING_COMPONENT <NULL> SIGNALLING_SUBCOMPONENT <NULL> SUSPECT_COMPONENT <NULL> SUSPECT_SUBCOMPONENT <NULL> ECID 5162744c6a2eea5e:155ff445:13ac0aae7cb:-8000-0000000000000325 IMPACTS 01 rows fetched
e) Create a logical package
IPS CREATE PACKAGE INCIDENT incident_number
e.g.
adrci> IPS CREATE PACKAGE INCIDENT 1Created package 1 based on incident id 1, correlation level typical
f) Generate the package
IPS GENERATE PACKAGE package_number IN path
e.g.
adrci> IPS GENERATE PACKAGE 1 IN /tmp
Generated package 1 in file /tmp/DFW99998j_20121102113633_COM_1.zip, mode complete
Note:
If the generate package command hangs, ADRCI may be experiencing an issue when running RDA. To avoid such trouble, exit ADRCI and point the ORACLE_HOME environment variable at WL_HOME/server/adr
3) Upload the package zip to Oracle Support via your Service Request
a) Log into My Oracle Support and locate your Service Request
b) Click on "Add Attachments
c) And upload the zip file